Abstracts

0 attendees

0

Saturday, March 4, 2023

Cation-exchange chromatography (CEX) is the industry gold standard for the analysis of biopharmaceutical charge variants. However, the development of CEX methods in a time and resource-efficient manner constitutes a bottleneck in product characterization. CEX separations are complex and governed by multiple factors. Several scientific publications have proven the successful application of design-of-experiment (DoE) in chromatography method development. Nevertheless, performing DoEs with a large number of factors may be challenging, time-consuming, and expensive. This work illustrates the use of a split-DoE approach to aid the development of a CEX method for the analysis of the charge variants profile of a mAb candidate. Analytical method development was intended to provide a high-throughput (HT) CEX method to support charge variants analysis with minimal sample and time requirements. The split-DoE approach is based on fundamental knowledge of the CEX separation mechanism and aims to reduce the number of experimental runs whilst exploring a wide experimental space. Regression modeling was used to study the effect of both individual process parameters and their interactions on the separation efficiency to ultimately identify the optimal method conditions. This study provides an efficient workflow for leveraging the development of CEX methods. Hello, everyone. Thank you for joining my talk. I am Giulia Lambiase, I'm a Senior Scientist at AstraZeneca. I work in biopharmaceutical development in the analytical science team. Today, I want to talk to you about the use of DoE for the development of analytical characterization methods, most especially chromatography methods. In today's talk, I'm going to talk about therapeutic proteins, what they are, and why they are challenging for analytical testing, and introduce you to the use of design of experiment for analytical method development and the application of DoE for the development of charge variance method, specifically cation exchange chromatography method. To start off, protein therapeutics are inherently very complex due to their larger size and the presence of the post- translational modification, and also chemical modifications that can the protein undergo through during the processes of expression in cells, purification, and storage. Monoclonal antibodies dominate the biopharmaceutical market, representing about 70 % of the total sales of biopharmaceutical products. However, recently, there is a push for new products, next generation biopharmaceuticals, which are bispecific antibodies, antibody fragment, fusion proteins, and many other formats. All of them come with unique challenges due to their complex structure and presence of higher order structure, glyco forms, charge variants, the sulfate bonds, oxidized deamidated species, isomerization, aggregation, fragmentation. A ll of these modification, chemical process [inaudible 00:02:37] modification, can impact on potency, safety, quality of the final drug product. This is why thorough analytical characterization and analytical testing throughout all the stages of product life cycle is key to meet regulatory standards, to be enabled to deliver a product that meets regulatory quality profile. We use a plethora of analytical techniques for analyzing proteins, and these are based mostly on chromatography methods, electrophoretic methods, and [inaudible 00:03:25] . Due to the inherent structural complexity of proteins, analytical method development can be quite challenging. In today's talk, I'm going to specifically talk about chromatography methods and the use of design of experiment to help the development of chromatography separation. Chromatography method can be quite complex, especially if you have a complex analyte like a protein. This is because the separation depends on the interplay of several variables such as mobile phase composition, buffer pH, flow rate, column chemistry, temperature, the type of detector that you decide to use for the analysis. All of these parameters need to be fine- tuned and controlled during the separation process in order to achieve the desired separation. DoE can be very useful versus one factor at a time approach. One factor at a time approach involved the variation of one parameter at a time, maintaining the other constant. This may lead to a large experimental run, lack of information because there's lack of investigation on factors interactions. Lack of information also leads to additional experiments during method validation, which may lengthen even more the method development process and finally retard the overall product development. DoE, in comparison to one factor at a time approaches, DoE enables the variation of multiple parameters at a time. This allow, with a reduced number of experiments, to investigate a large number of factors, including the interactions between them. Also the development of mathematical models that allow the assessment of relevance and the statistical significance to facilitate all the steps required during method validation. DoE enables really to investigate a wide design space with less resources, so in a more efficient way. In fact, I like saying DoE enables faster, cheaper, and smarter experiments to deliver stronger and better analytical methods. In today's talk, I'm going to talk you through a split DoE approach for the development of a cation exchange chromatography method. Cation exchange chromatography is used for the analysis of charge variants. Specifically, if you see here on the left hand side of this slide, you can see a chromatogram of a protein where you can see some acidic species here on the left, [inaudible 00:07:17] on the left of a main species peak, and some basic species peak. All these acidic basic species can be formed due to the presence of chemical modification that can lead to superficial charge distribution variation in the protein. Cation exchange chromatography methods are quite complex chromatography methods because the separation efficiency is affected by a number of factors and quite sensitive to small changes of these factors such as column chemistry, mobile phase pH, temperature, flow rate, content of salt, time of the separation. In this approach, I'm going to talk you through an efficient way to develop cation exchange chromatography method using DoE. If you are familiar with DoE, you may know that often requires a sequential approach. In this experiment, I performed a main effects screening design for enabling the selection of the best column chemistry and the mobile phase pH for the charge variance separation of this specific mAb molecule. During the second DoE, I use response surface methodology, particularly a central composite design DoE, to optimize the chromatography separation by changing the flow rate and [inaudible 00:09:36] . Let's take into more detail in the first DoE experiment. This was a main effects screening design where I screened four column chemistry bought by four different providers, Agilent, Sepax, Phenomenex, and Waters, and I screened a range of pH from 5.5 to 6.5. My response was the experimental peak capacity, which is a parameter that tells you the efficiency of a chromatographic separation, precisely the number of peaks that can be separated within the chromatogram, the chromatography time that you set. Other parameters such as concentration of buffer, concentration of salt at the start of the chromatography gradient, flow rate, gradient time, shape, temperature, injection volume, concentration, and the UV absorbance were kept constant. These are the results for the first DoE. On the left hand side, you can see the four different column results. You can see how the experimental peak capacity changes versus the pH change in the mobile phase in all the four different columns. You can see that we aim to have high experimental peak capacity values. You can see that the Phenomenex column performed best. In all of these three columns, we can see that pH of 6.5 enables greater experimental peak capacities. But the Phenomenex column allowed for better separation results. It is also visible on the right hand side of this slide in the panel A. You can see at pH 6.5, how the separation differs when using different chromatography columns. We have Agilent , Waters , Phenomenex , and Sepax . Definitely, the separation of the charge variants using the Phenomenex column is much better than in the others because these acidic peaks are very well separated as well as these basic species here from the main product peak. Panel B, we have isolated only the results of the Phenomenex column. How the chromatography separation was that with the mobile phase or with pH 5.5, 6.0, and 6.5. We can see how the separation improves with the increase in pH. Obviously, the mobile phase pH is dictated by the intrinsic molecule pI. We could only investigate this range. Otherwise, the molecule would have struggled to find its own column. Based on our fundamental knowledge of chromatography separation with cation exchange columns, we decided that this parameter, so this Phenomenex column and pH of 6.5, were optimal to carry on development. We carried on with the second DoE using a central composite design. Central composite design is a type of DoE falling within the umbrella of response surface methodology, which is used for optimized conditions for investigating the presence of curvature, for instance, and extrapolate optimal values. In this case, we use our Phenomenex column and mobile phase pH of 6.5, and started to play with other parameters such as buffer concentration, concentration of the salt at the start of the gradient, and flow rate to investigate optimal conditions. Central composite design enabled to very efficiently, with a few number of runs, to identify optimal separation conditions, optimal method conditions. In fact, at the very end of the split DoE approach, we could say that with the investigation of four column, mobile phase, pH range, salt composition, gradient flow rate. With only 27 experimental runs, we could optimize a method for a monoclonal antibody. This method is very useful because it is now used as a quick, high throughput screening experiment. In a quick, high throughput analytical method for screening differences in the charge variance profile of these specific molecules expressed in different conditions and compare it to a standard. You can see here that the blue line is our reference standard and the red line is a stress material of the same molecule. You can see how the charge variance profile changed as a consequence of the stress condition applied to this molecule. This was achieved thanks to this analytical method which was developed and optimized with a DoE approach. We also decided to implement this DoE approach as a platform workflow for analytical method development for new products, new bio pharmaceuticals, and we screened a number of products. For all of them, we applied first the first main effect screen design, and we identified the best column and mobile phase pH to use. Secondly, we applied the central composite design to optimize the separation. Now, we have identified a platform column and a mobile phase composition for this class of therapeutics. When new molecules comes into the pipeline, we can very quickly, just by using a central composite design, which involves actually just 12 runs, optimize the chromatography profile and deliver an optimal cation exchange method for a specific product. The key take- home messages from my talk today is that DoE system method development followed by appropriate statistical analysis enables to plan experiment based on time, cost, and analytical resources available very efficiently, and schedule the execution of experiments with adequate sample type and size to extrapolate the maximum amount of information from our chemical data and efficiently address the challenges and goals of the intended research. It definitely saves time and cost for experiment execution in comparison to one factor at a time approaches. Most especially, it allows the complexity of analytical method development, but still interrogating several factor at a time and studying the effect of both individual method parameters and the interaction on the dependent variable. With today's talk, I hope I inspired you to apply more DoE in your experiments. Thank you very much, everyone, for your attention. If you have any questions, feel free to reach out to me. Thank you.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

In the semiconductor manufacturing industry for automotive, parts are tested at each manufacturing step to screen likely-to-fail parts. The further upstream the weak parts are scrapped, the lower the scrapping cost will be. But testing has a cost as well. A recent project at NXP sought to avoid a manual defect classification of the defects observed at the wafer inspection level. Defects are now classified as killer or not-killer from a training image dataset, and a failure probability is assessed for each die. JMP® allows a further step in correlating this failure probability to electrical tests with three types of analysis. The first analysis assessed a failure probability threshold to limit the number of parts tested to limit test cost. The second analysis highlighted the tests most correlated with failure probability. The final analysis used the list of highlighted tests to adjust test limits to screen the parts with failure probability outliers. The analyses limit test costs while increasing quality. (view in My Videos)

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

All gauges have errors. They might be minuscule, or they might be large, but they always exist. Large or small, the errors lead to gauges having some likelihood of making Type 1 and Type 2 errors (passing a bad part or failing a good part). The mistake likelihood is higher for parts that lie near the specification limits. These errors cost real money! But how do we quantify those costs? This paper builds on the results shown in a 2022 JMP Americas Discovery paper (2022-US-30MP-1123) that discussed how to quantify the gauge performance and how to set “informative manufacturing specs” (or guardbands) to improve the gauge’s performance in segregating good vs. bad parts. In this paper, we extend the learning and script functionality as we discuss how to combine gauge characteristics with the costs of individually passing a failed part, rejecting a good part, and projecting production volumes. This gives insight into risk analyses, e.g., how much I should budget to account for gauge errors, whether (or how much) to spend on improving our gauge, etc. Hi, I'm Jerry Fish. I'm a support engineer with JMP, helping customers in the central part of the United States. Today's talk is entitled My Gauge Isn't as Good as It Could Be— Will Its Errors Cost us Money and how much? I'm Jason Wiggins, also a senior systems engineer, and I support semiconductor users in the Western United States. This talk is a follow on to one we did for discovery Americas in 2022. In our first talk, we introduced the notion that measurement systems are integral to our businesses. In fact, we have many measurement systems we interact with in our daily lives. Along with that idea, we introduced the notion that measurement system or gauge variation can impact decisions in real world inspection situations. We introduced gauge performance curves as a way of visualizing gauge variation and relative to specification limits. In this talk, we'll extend that and explore the costs associated with gauge variation through a fun role play conversation between a quality manager of an automobile manufacturing plant, that'll be Jerry, and I'll be acting as a quality consultant. To kick things off, I'll get on a quick team call with Jerry. Hi, Jerry. Hi, Jason. How are you doing? I'm doing pretty good. Thanks for spending a few minutes with me. As a quality consultant, I help quality stakeholders like yourself understand and improve processes. Now, I prefer using JMP as it's a general purpose, easy to use data analytics package that has many quality and process control features. JMP makes quick work of the analytics part of process improvement, so more time can be dedicated to actually improving the process. Well, it is nice to meet you, Jason. Just to let you know, though, we already have software in place for our internal quality programs, so I'm not really sure what your software can do that we cannot already do. Can we make this quick? I understand completely, Jerry. I'll try to make the most of your time today. First, can you tell me a little bit about your company and your quality program? Sure, happy to. Acme Motors has built a reputation with our customers of manufacturing the highest quality cars. We're always concerned with quality. We have various gauges that we use to ensure our quality stays high. We've been doing this for years, and frankly, we think we're pretty good at it. I'm familiar w ith Acme Motors and your high quality reputation. My consulting team and I have recently been working with manufacturing companies like yours to advance the use and effectiveness of gauge studies. Measurement systems analysis, another way of saying that. One of the things we seek to understand are the monetary costs associated with the gauges used to measure process quality characteristics in your manufacturing plant. Have you quantified how much any of your gauges are costing your business? I'm not sure what you mean. Well, gauges are not perfect. They make mistakes. Sometimes they'll throw away good parts and sometimes they'll pass bad parts. Unless you have a perfect gauge and really no one has these, these mistakes are inevitable. I suppose so, but we've done gauge studies that say our gauges are good. Well, some of them are actually categorized as adequate by the AI AG guidelines. Doesn't that mean we're okay to use them? Well, possibly, but there is a lot more to the story than just using good, adequate, and poor AI AG gauge assessment criteria. For example, have you seen a gauge performance curve? I can't say that I have. No. Now, this is what one looks like. The X axis shows the true part values and the lower and upper specification limits are shown with these lines. The Y axis shows the probability of passing a part. If you have a part that is truly good but very close to the lower spec limit, there's almost a 50 % chance the gauge will recommend that you throw it away. That's one way of thinking about it. But also, there's nearly a 50 % chance that you will accept a part that is truly bad and near the lower spec. Very interesting. What happens if we could change the variation of the gauge then? Well, the shape of this curve definitely depends on how good your gauge is. Let's play with this just a little bit. What if we could reduce the variation by a factor of 10? Make a quick change here and replot our gauge performance curve. If this is possible, we will correctly accept or reject more of the parts. We're moving from incorrect to correct when we do this. Let me break from the role play for just a moment. The gauge performance curve I am showing is an add in that we made for our 2022 Discovery Americas presentation. The add in is available on the community. Back to you, Jerry. That is really an interesting chart, Jason. I don't think we do anything like that. What you're saying is that the gauge errors contaminate the measurements, but all I have is the imperfect measurement. Your gauge performance curve is plotted versus true part values. I wish I knew those true part values, then I could know exactly which parts to keep and which to throw away. Is there a way I can know the true part value? We could know that directly if we had that ever elusive perfect gauge, which we don't. We really can't ever get to the level of knowing the true value of an individual part, but we can estimate the true part distribution given our knowledge of the gauge characteristics and the measured part distribution. How would you do that? Well, if we assume that gauge errors are normally distributed, and for the moment, let's ignore any bias or linearity problems that you might have, and that we have the measured part distribution. If we have that, we can back out the variance of the true part distribution using a simple equation. That simple equation is just the difference between the measured part variance and the gauge variance. The plot on the right is shown for a situation where the measured variance is 25, the gauge variance is 16, and if we subtract those two, the true part variance is nine. We're beginning to get the parameters for that distribution because we know the results of our gauge study and we understand the variance associated with our gauge. Now, we would, from this, build a normal distribution that centered on the measured distribution mean with the new standard deviation. A gain, the result of which is going to look like the plot on the right. In the plot, the blue bars represent your measured part distribution. The areas above and below the spec are are shadeded in pink. Notice that the measured part distribution is much wider than the true part distribution. The measured distribution is what you get when you run the true part distribution through your imperfect gauge. Okay, that makes sense, at least a simple case. How do you relate this to what it's costing my company? Well, we can use this information in a numeric simulation to characterize the mistakes that our gauge is going to make. When we do that, we can generate a part inspection table like this. Let me study this table for a minute. My eye is immediately drawn to the center, the green box that says 95.4 %. Am I interpreting this right? 95.4 % of my total production is truly good and we're shipping it. Correct. That's a good thing. Now, looking at the first and last columns, if I add 18 and 25, let's see, that's about 0.043 % of my production parts are truly low parts. A nother 0.041 % on the last column are truly high. This is bad. It says my process is making bad parts that must be thrown away or reworked. I see another problem. If I look at that center column and I add those all together, I get 99.9 % of my production parts that truly are good. It says that the gauge is identifying 2.3 % of those as too low and 2.3 % is too high as well. Now, the customer doesn't care about that. They're still getting good parts, b ut I certainly do. I'm making good product and I'm throwing it away. Worst yet, look at that center row, those red squares. There's another 0.036, 18 and 18, of truly bad parts, parts that are too low or too high that are being accepted by this measurement gauge. This is serious. I do not want to ship bad parts to my customer if I can help it. That's right. This is cool. You're beginning to see the cost of having an imperfect gauge. This is really interesting. It shows that if I don't do something about my imperfect gauge, I'll risk accepting bad parts and throwing away good parts, both of which are bad for my business. On the other hand, I think we've got a way to handle this, Jason. Okay, what's that, Jerry? Well, we use something called guard bands. These are our bands that are set inside the specification limits. If we set them far enough inside the spec limits, we can reduce and essentially eliminate shipping bad parts. Doesn't that fix at least part of our problem? At least, guard bands are definitely a good way to reduce the percentage of bad parts that make it through your inspection process. A lot of companies use them. Have you considered the fact that improving quality using guard bands comes at the expense of throwing away good parts? Honestly, that has occurred to us, but we haven't tried to quantify that damage. Well, let's extend this example out a little bit more and let's just assume that we bring those specifications in by one unit of measure. We'll call these guard band limits. Our lower guard band limit would be 41 and our upper guard band limit would be 59. We're going to use this as our inspection screening values instead of the original upper and lower specs. Now, we can do the same numerical simulation and update the results. Let's just take a look at the differences between the tables. Can you see how the percentages have changed? We went from shipping roughly 0.04 % of parts that were truly bad to only shipping 0.03 % of bad parts. That looks successful. Maybe we could even squeeze our guard bands in further and improve that. Especially given our high production volume. We're talking real bucks here. It is. Also notice how many truly good parts are now being screened out. Every time you screen out and throw away good part, it is costing your company money. Well, you're right about that, Jason. Is there a way to look at this monetarily? What if we assume that a bad part in the simulation results in a bad car? Can we input the cost of scrapping the car and see how that affects the bottom line? Absolutely, we can do that. I'll need to get a little information from you, though. First, how much does it cost to make the car? Yeah, let's say for the sake of this demonstration, $35,000. Okay, great. That means for each rejected car, it costs your company $35,000. You might manufacture in rework costs here, but let's say, for example, we just throw the car away. Now we need production quantity. I don't know. Let's just choose a million cars. Okay, great. Now, how much do you charge for a truly good car that makes it to a dealership? The dealerships buy... Let's just say they buy these cars from us for $40,000. Okay. If I understand this right, your profit per car is that 40K minus the 35 K and your profit has been $5,000 per car. Right. Last thing, do you know the cost associated with selling a bad car? That's a little tougher. There are the obvious costs of repairs to the bad car or potential cost of return. Those are relatively easy to calculate, but there's also damage to our reputation. Our customers demand quality, and if we start putting bad product out the door, it can quickly get out of hand and result in lost future sales. That's a lot more difficult to calculate. I know you need the number. For the sake of argument, let's just say that totals to $50,000 per bad car that makes it out into the market. Excellent. Let's take a look at the profits and losses. Same simulation. Just review, make sure that we're looking at the correct values. You told me that manufacturing cost per car is $35,000. You then sell that to a dealer for $40,000. Our profit is $5,000. Cost of selling a bad car is $50,000. We're going to look at this across a 1 million car production run. Have I captured e verything, right? I think that looks good. All right. If we look at the net profits and losses, you stand to make about 346 billion from the 1 million cars you make. That sounds good. Not bad. The total profit from the truly good cars that are shipped is about 371 billion. The loss due to making truly bad cars that are caught in your inspection is 199 million, which is the sum of 98 million plus 101 million. Okay. The law... I let you digest for a second? Yeah, I'm following. Okay. The laws due to shipping truly bad cars, this is the one you are really concerned about, is 137 million, which is 68 million plus 69 million. Finally, the loss from scrapping truly good cars, this is what's costing your business, is $25 billion. That's quite a lot. That's the sum of $12.4 million and 12.4 million. That's fascinating and also a little depressing that we're losing that much money. If you change things, let's say you change the guard band settings, will the total net profit change? That's right. That's definitely true. You could see that change. In that case, then could there be an optimum? I can imagine widening the guard bands or narrowing them and looking at the net profit, would there be an optimum for that net profit peaks out? Yes, you can definitely explore that trade- off in a lot of different ways. You could answer questions like, how would improving my gauge by a factor of 10, like we showed with the gauge performance curve, improve my profitability? Or how much can I afford to spend on fixing or replacing a gauge? If we know what the costs to the company are for our measurement system, then we can justify the cost of fixing or replacing gauge. Also, just to your point, what if I adjusted my guard bands? We can definitely answer that question. A nother common one is what if I improve my process capability? I just tighten the variation in my process, what does that do to my profits and losses? I could trade that off against the cost of improving that process capability. Interesting. Well, I must say, Jason, I'm impressed. This has been a good use of time, but I think I owe it to my company to muddy the waters just a little bit. This is all great for normal distributions and simple gauge errors and those kinds of things. Those calculations that you've shown are easy. But what if I have gauge linearity or bias problems? Or what if I have a skewed distribution, which is really pretty typical in my company. We rarely run into the nice bell shape curve. Getting a true part distribution out of the measured part distribution becomes a lot more difficult than just using that simple formula you showed earlier. Can you even can do that? Absolutely. We are writing an add- in that will make you able to define the shape of any measured part distribution. W e can do the same exercise with measured part distributions that are normal or log- normal, uniform, Weibull, or even a custom distribution. It's an add- in we're working on. It's a work in progress. All right, that's fantastic. I'm ready to buy in. When will that be available? We have the basics of the add in worked out, but we need some time to make it more user friendly. We'll be working on that in the coming few months. Probably before midyear, we'll have that wrapped up. When it's done, we'll post it on the JMP website in our community file exchange. A few months, really? I'll forget all this by then. That's okay. We recognize that. Once our ad in is ready for prime time, we'll announce a series of open to the public seminars where we will go into detail about what you've seen here, as well as other aspects like relating these concepts to Donald Wheeler's EMP methodology, which is another personality in the Measurement Systems Analysis platform in JMP. Here's a quick peek at the topics for the up and coming talks. We'll spend more time elaborating on how gauge studies that are using the AI AG classification that we talked about earlier, we'll talk about how that can lead to unrealistic gauge assessments. We'll also explore how Wheeler's Evaluating the Measurement Process, the EMP method, can provide us more realistic gauge classification. We're going to present the problem with AIG and present the solution using Wheeler's methods. We'll also show how EMP method can advise us on how to use our gauge. How do we use it in the production process? One example that we'll be covering is objectively setting guard bends. The remaining topics, hey, we'll spend a little bit more time interpreting gauge performance curves, talk about how to blend performance with part variation to determine cost associated with imperfect gauges. Really, that is what we're talking about today, but we feel like we need to extend that a little bit so that we all understand how that works. Final two topics, how can Wheeler's calculations be factored into this gauge cost conversation, and how to understand gauge cost, again, to the point of non- normal part distributions. That's perfect. Can you make sure that I'm on that invitation list? I want to make sure that everyone in my quality department attends your seminars. Sure thing, Jerry. Anything else I can do for you today? Yes. Get back to work on that ad- in. The sooner it's available, the better. Will do. All right, this concludes our presentation. I'll say that as we were doing research for this talk, we uncovered many concepts that are important to understanding how to use measurement systems. We feel like one of these concepts deserve more time than we had in our talk today. We look forward to continuing the conversation with you in the coming months. Any closing thoughts, Jarry? Just that if you've kind folks that are attending today, if you're interested in attending those upcoming seminars, please let your local JMP support person know or your support team know, and they'll make sure that you're included on that invitation list. Excellent. With that, thank you, everyone, for attending. Thanks, all.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

JSL can be used to make GUIs to access data in measurement systems with pre-programmed scripts, such as for SPC. For this to work, JMP® must remember the control limits from the status quo when new data arrives. A second JSL script is needed to connect to a DB and load existing limits for variables if they are present. Control limits can be altered/added manually and updated in the data table or automated with default values and a factor for multiple variables. When others pull the data again from the DB, the control limits are automatically added to the column parameter and go into the SPC. This presentation will provide a live demo and JSL example of how to insert and update data in a DB. Hello, my name is Mauro Gerber. I work as a data scientist for Huber and Suhner in Switzerland. I would like to introduce you to the problem we had regarding SPC and I would like to have you on the right. I want to talk about SPC scripting and why it's important to write the information back into a database. W hat we have is an optical measurement system that measures dimensions on parts that we store in a database. The goal was now to get the data back out of it and to statistical process control on it. Now what happened was that over time some variables can shift and the worst case would be that we get the out of spec and then we have to do some measurements to get it back in. It would be preferable if we can do that beforehand. The idea of SPC is like telling a dog to stay where it is. W e achieve this by redefine a stable face and say if the process moves out of that window, it gives a notification before it gets an out of spec and we can take measurements to get back into stable again. O ne way to achieve this is by the process screen platform. We can sort about this control chart alarms which gives us how many of the variables violate the test one. Test one is simply out of control limits. As you can see, the most alarms I get from the most stable process. T his is a bit paradox. As you can see here, I have a very stable process and the SPC limits that gets calculated automatically gives me very much false positives. ` I shouldn't react on this because it's just within the variation. The second problem with it is if I have an order and analyze the samples, it calculates the control limits for me. Like in this examples, it says it's all good. I f I later on analyze a second order, the calculation of the limits automatically switches to the new process variations. I t makes the window bigger and again says, hey, everything is okay until we see of course that it moved and the variation got bigger. W hat I did making an SPC script that deals with the measurement and loads and stores SPC limit in the database so we can have a proper working SPC. I would like to switch to demonstration of how this looks in our database. W hat we have is a script that imports the data. What does is it goes into the database and search the whole product order we have stored for this demo. I made a special data set and it automatically contains this program. For us is like article or part. W e can select like PO1 that the first one we did in our example, I can say okay, I want to have a look at that data and at the back and this is the final result. A s you can see here, after [inaudible 00:04:23] and SPC. In our example, this is what happened. I have a stable process, everything is good. Look at all processes. This is where the error came from. Again I can show you the SPC. Y ou see here, in the beginning the process was okay. Then there was a phase from where the variation got bigger and then it got back again. T his X1 got a problem. The problem was X2. As you can see here, I got some parts out of spec. This, I would like to change now. A s we discussed earlier, what we need to do is go to the first PO and set this table one I want to go back to and I make now run the script that goes over the table that is active and extract every line with spec limit. I see here, this is the spec limits and you can see here empty control limits. W hat I can do now is two ways. I can either activate this PC here, have a look at the data and say zoom in a bit and say manually on X2 lower control may need 2.98 and 3.02 and down. Tell the program hey, please update the limits for me and I go back to-- No this was wrong , sorry. 2.97 and 3.01, update. T he limits I just set are now in the control limits which I can check in X2 it added the control limits for myself. The problem is now if I close the table and read download the data, these control limits are lost. W hat I can do now is save it, close all, reload the data and the script now looks if there are control limits present for X2 and I did none for X1 and if I go here now SPC X2, the control limits are now set as desired. If I have a lot of control limits present in my data set, like 20 or so, it could be quite difficult to set the limit manually for everyone. This is where I made the script that you can actually select the desired variables you want set default. A s I showed you earlier, if I would set it to one, it would copy the automatic calculated limits from the system which can be too tight. T his case I say okay, I want the margin twice as big. Now it runs through every row and sets automatic the limit. I save the limits to the database and if I run the limits again, you can see it's centered and it has a nice window around them quite easily. I can manage limits for a lot of variables. If I go now to the problem we had. I now select all of them, run the script again and run it again. Now it takes over the limits I set earlier. A s you can see here, I would get warnings earlier on that something is wrong and then we could have prevented this from happening. A s you see here with some countermeasures and now we are back again into the stable phase. What I use in the script is I search with spec limits to identify which one are variables to work with. This can also solve problems with platforms who depend on spec limits. I can filter them and only make limits or control limits for SPC or spec limits for the process capability platform. I don't get an error message that control limits are missing. What is important is of course safety. Habit of jump is that the source is in the file. W hatever password or database connection you use gets stored into the table. This of course, can reveal server name or even username and password. To get around this there's a preference ODBC Hide Connection String. What you also can do is in the script itself, encrypt the code with password and username. You see in here, so people are unable to read it. When I write into the database, I use the code create database connection. I set the reference and then there is SQL statement I put together from insert into which database name it is. Then what I write that's from the list I generate program names, empathize and the control limits. Th en execute these SQL statements from the connection string and the SQL string. What's important is that when the log is longer so I get an error message from the SQL, it beeps me and put it in the log. W hen something is working wrong, I can check it like credentials are wrong or the connection could build up or whatever it's possible. Then I write the control limits into the data table. I check for the name of the article and if there are SPC limits present, if they are, I update the column properties with the control limits which then get automatically displayed in the SPC and the program handles accordingly. Second use for writing something back is if we have a measurement we make a test with it like environmental test or endurance test and make a second measurement. I have the same part with two measurements or I have a false measurement like something went wrong. I retake the measurements and the second measurement is actually what counts. T his I can navigate by a small script that says update DB. I can give me a dialogue if I want to set the measurements inactive or update it. If I want to update it, I can type which label I want and what's the name of it. It works similar as the function name selection in column. What it does is for each measurement I get a unique ID, even if the part itself has a serial number. I have the serial number measured twice after zero hour, after 100 hours from test option one. I t got different measurement IDs. This is how I differentiate between the measurements. For the inactive part, that's simply that in the SQL statement I can say if the measurement were inactive disregarded so these faulty measurements don't show up. K ey points is careful that you may not give out some sensitive information like password usernames. You can hide certain parts in the script with encryption the code is JSL Encrypted and then the encrypted code, is in here as a text. U se pref ODBC Hide Connection String. A nother way is use windows authentication to avoid credentials altogether. This also can help that you can use specific columns that are writable like the label one, label two column and to avoid manipulations. We have to policy a user can set a measurement inactive but he cannot delete it. I f we will search to the database, we can restore all the data if something went wrong. Another good practice is check that the data is actually written into the database like a handshake and enrich the data with important information that can be handy for user years later. Because they may don't have the information w hy you took tree measurement of the same part and maybe wondering why it's getting worse and in that all have the same information to work with. I can straight this little future view. I can go back in here for the demo. Usually when we have press builder make it row there in the string second two. What I do is select those parts row, name selection in column and that would give me a column which I then can save. I f I reload the data, this information is lost. What I can do now is this feature update database. I can update it, I can say, hey, label one, this is stable. Y ou can see here, label one is now marked as stable. I can go and say okay, these measurements update I want to leave on stable, close it and I simply put the data from the database and as you can see the stable, unstable faces are stored. I f someone later makes it, then they can bow conclusion from the phases here is missing. This is the stable face, this is un stable face and the same goes with data. I can select some object inactive. It will warn me if I want those trees inactive a nd the next time I would call the function they wouldn't show up. If I want to have them back, I can select here include arrow meshes that those measures come back again. T his suits my script. I will thank you for listening and if you have further questions I will be [inaudible 00:19:42] later on. Thank you very much.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

Cerba Research is a global company providing high-quality, specialized analytical and diagnostic solutions for clinical trials. Cerba Research Montpellier develops customized immunohistochemistry protocols to detect the expression of selected targets on patients’ tissue sections. To be used in clinical trials, these protocols must meet the regulatory agencies’ criteria to ensure that the protocol will allow consistent results on precious patients’ samples. With the diversity of parameters evaluated and the types of evaluation possible in implementing these custom protocols, automating data analysis became a need. Thanks to various JMP® tools, we have developed an automated analysis that saves time and homogenizes protocol performance reports by including statistical and graphical data in a Dashboard. This process, submitted as JMP Add-in, has been incorporated into our user workflows, thus facilitating our procedures. Used version: JMP v16.1.0 Hello. My name is Marie Gérus-Durand, and I'm working for C erba Research Montpellier. Today, I will show you how we set up automatization of immunohistochemistry data analysis for protocol validation in Montpellier using JMP and its tools like dashboard and add-in functions. First, some words about Cerba Research, it's a worldwide company with capabilities in all the continents. Here I highlighted in yellow the department I'm working for. It's a Histopathology EHC Department. As you can see, I'm based in Montpellier in France. W e have also other labs in US, in New York, and in Taiwan in Taipei. First of all, what is immunohisto chemistry? The aim of the technique is to detect targets protein mainly on a tissue sample. Here you have a slice of a tissue, for example, when you do a biopsy. We will look at the targets of interest using antibodies, which will detect the target. This antibody is recognized by another one which is combined with chromal fluoroform or active components that allow the detection of the target. Here you see, for example, these three components are highlighted, meaning that antibodies bind it and we can detect it. After the experiment, we can can look at the slides under microscope or using a scanner, which allow visualization and analysis of the results. On the next slide, I just zoom in so you can see better what it looks like. Here it's a skin sample and you have a cell nuclear in blue and the target of interest in red. One of the challenges within immunohisto chemistry and histopathology is that you have many possible protocols, colorations. Here on the left, it's two different histological colorations. It doesn't involve antibodies like I show you, but it just reactive with the different components of the tissues. You see here for the MOVAT, we have five colors. For the HE, we have only one color, but the intensity depends on the type of structure you are looking at in the tissue. On the right, you have two immunohistochemistry protocol. One simplex, we called it, because we detect only one target, and it's a chromogenic here. It's in brown. On the top right, you have a multiplex. H ere we detect many components. Here it's a fourplex, four targets on the same slide, and each is revealed by different flow of work. You have different color for each of the targets. Among of these coloration detection possibilities, then you have a multitude of possible analysis method. The slides can be analyzed by a pathologist, which will give us qualitative or semi- quantitative data, or by image analysis, which will give us quantitative data. Another layer of that is that you can have reportable parameters which are single. For example, if you have a simlplex, you detect the targets, only one target, and you assess only one parameter like percentage of positive cells, for example, or you can have many. For one target, you can have the percentage of positive cells and a specific histology score. Or if you have a multiplex, then you can multiply this for all targets in the multiplex. Each report level parameter is target- dependent. You can imagine that we have a lot of combination that we can have to access during our validations. In Cerba Research M ontpellier, in 2022, we have a small part, like 20 % of our project related to animals. We are studying animal samples. The other projects were on human samples. Among this, most of them are four clinical trials of the project. That's a very [inaudible 00:04:56] . W e have some others that are outside clinical trials, a quarter of them, and a small portion, 3 % that are CAP compliant. CAP is a specific regulation for US. It is to know that before being used in a clinical trial, we should demonstrate that our protocol that we developed in Montpellier show consistency in results for section of the same sample. If we analyze the sample at different type points, for example, the samples of patients involved in the study, the first year should be the same than in the five years after. On the different automatons, we have at our different sites, and when the samples are analyzed by different operators or pathologists. This applies a rigorous validation according to the health agency, and this validation is mainly based on statistical criterion. The implication for the company is that we need to increase the team members to support the increasing number of projects we have each year, and we need a normal genus statistical analysis pipeline to be sure that we will give all our clients the same type of results. Obviously, we need statistical analysis tool, and it's when we choose JMP to support our validation of the protocol. Today, I will show you only a part of what we are doing because I don't have time to show you everything. I chose a quite simple example. It's a kind of experiment we do to validate the precision of the protocol, meaning that we check the intra-run , which is called repeatability precision over three slides. Here are the three ones highlighted in purple in the same cycle. The three slides I run at the same time, they come from the same sample, and we just check that we have the same data. The inter-run, reproducibility, test over two slides highlighted in blue in each cycle. In total, we have six slides over three cycles. For reportable parameters, I will use an image analysis dataset, which is quantitative data and usually easier to analyze. We will have two reportable parameters for one target. How do you start? First, we need to import the data. I don't know if you are familiar with that. But in our case, we have data either from Word documents, so we use the Word Import Tool available on JMP community website . I put the link here so you can go and find it again. Or we import data from Excel either directly by opening the file in JMP or by using the JMP tool in Excel. For this presentation, just to be faster, I create some script to help me to focus on the dashboard creation after that, which will take more time. But this is just some script and just to be faster, but I will not develop into it then. Here I will open a data table from Project X here. As you can see, it's a quite simple table. I have four columns with the validation, the slide ID, which are internal slide numbers, and the dataset for my two reportable parameters. This data continues because they come from image analysis. Once I have this, I will need to prepare my data. It's a most time consuming part of analysis data. Again, it's why I have a script. You see the five columns where I did. The sample to be able to correlate each data to the same sample, which is a part of the slide ID we have internally. I just get a formula here to help me to do that. The slide number, which are the last digits of the slide ID, and the slide order, the 1, 2, 3, 4, 5, 6, 7 for each sample. Thanks to this slide ordering, I would say, I implement the repeater. The three first slides which were staying in the same cycle for repeater, and two first slides of each cycle for reproductivity test. Here I have all the information needed to do my analysis. I go back to my journal, and we will want to do the dashboard question. I still have some steps to do before that because I would like to have all the analysis I want to put in the dashboard. Here are the two little table you see where we are required to analyze the CV of our protocol for each sample for repeatability. I selected only the three first slides thanks to the local data filter. The same for reproducibility, where you see I have the slides from the reproducibility column. Here are the data that I need. This data I updated them in here. You see I have much more columns now. It's easier to find the name here. I have the sample CV for repeatability, for reportable parameter 1 and 2, and then the same for reproducibility for the two parameters, and then I do the mean of both samples for each of these columns. Here, all the data I will need to implement in my dashboard. I can cross this table. I don't need them anymore. I will now do the graphs and tables that will really fit in the dashboard. Again, what I want to show to the client is the distribution of the data for repeatability. I put as well the standard deviation and the mean. Usually, it's pretty good here and for reproducibility where I have all my six slides. Then I would put a table with a CV for each sample for repeatability and the reproducibility on the left for the first parameter and on the right for the second one. The same outline for the mean of the two samples. These are the four part I want to show on my dashboard. I will show you how it looks like. I want to obtain something like that. This is often I did that I can show it, the two graphs and the two table. It's what I would show you how to do now. You see that all the graphs are to the same data table, sorry, and it's much easier to do the dashboard after. I saved as well all the scripts so I can redo them whenever I want. I will create a new file, new dashboard. You have many templates. I usually start from blank and just you have to put in what you want to see. Sometimes it's a bit difficult because it's small, but we always manage to find our way. My table at the bottom, you just drop them where you want to have them. It's pretty simple. You can change the names of each part, so I will not do it just. But you can see that you can edit all the parts. You can run your script and then give you the dashboard. It's pretty similar to what I showed you before. I have my two tables at the bottom and my two graphs at the top here. I have inverted the two, so I usually prefer to start with repeat that. I will just switch them. If I put it where something was already, they just switch. Here we are. This is the layout I want, so it's good. After, you can play to see better, more or less, of the table, et cetera. But this is just for visual [inaudible 00:15:00] process. Then, okay, now we have dashboard, but it would be more interested if we can directly go for the dashboard when you have all your data you want on the dashboard and click and you have a dashboard. For that, we just need to do an add- in. It's quite simple. Thanks to the magic triangle, I call it, the red triangle here. You click Save Script and just To Add-I n. Then it will create a script that will do the same dashboard again. Let's go back to the data table. I will close this one. You are sure that the one that you will see is not this one. Sorry, I shouldn't have closed it before doing the dashboard. I will just use this to be faster and not to create it again. Here, if you click on the red, Save Script, To Add -In. Then this is the name you will have in the add- in list, but it's to manage your add- ins, I would say, but it's not the one that will figure out in the add-in tab in JMP. The name that you're in the add-in tab is this one. For today, I would just call it Test so I know which one is this. Save. You see here, you have all the script used by JMP to do this dashboard, and I will save it in our Project X. Here you see it take the name on the first tab Dashboard only. I save it. Here I have this both tick Install after save, so it was already put in my add-in list. If the box was not ticked, then you have just to go to the location. You save your file and click on it and it's installed. Now I can close this dashboard . I have created my complement already. Now, how to use it? I just went a bit faster. If you open it, it installs. As I have it already, I will not start it. It's just under it. If you go to View, Add-ins, sorry, Dashboard, and Unregister, it's not erased. I will find it again when I go to my project. Here you see I have it. If I double -click, it ask me if I want to install it. Sure, I want to install it. It's back here again. You can share it. You just have to copy the same file I clicked on and paste it in a shared folder or send it to a JMP user colleague. You can modify it. For this you have to go in Open. Again, this is the dashboard but then the black click this time, just go on the arrow here to open and Open using Add-In Builder. Here you go back to the first time window where you have your script here, and you can either edit the script or put other functions that I don't really use, to be honest. But you have many functions. I'm sure you will find more information on JMP website about that. This, for example, will allow you to put all the preparation step in the same complement. When you run, everything is done at the same time. This is it. In conclusion, using this dashboard and add-in functions allow us to have reports consistency because we have always the same set up of results to send to the clients. We increase the traceability. Thanks as well for the use of the scripts because we are sure that we are all doing the same. I t's a great time saver because as you say, I just have to click on one button and I have my dashboard. If you combine this with a precision timing. All the data preparation, then you have your table, you click on one button, and you have everything done. It's a great time saver. It's all I wanted to show you today. I hope you enjoy it. If you have any question, don't hesitate to reach out to me either by email— you have the email on the first slide here— or through the JMP community. Thank you.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

Statistical evaluation for biological assays is critical because a lot of data needs to be summarized for reporting to customers and authorities for drug registration. JMP® is a helpful tool not only for calculating the required parameters but also for automating evaluation. For example, it can be used to graph and automate calculations for Repeatability, Intermediate Precision, Linearity, and Robustness of a Relative Potency Assay. Because these calculations are often required, a single JMP file for calculating the parameters saves a lot of time and can be used by novice users. Furthermore, the evaluation remains consistent throughout various assays, even when different technologies are used, such as SPR, ELISA, or Cell-Based Assays. Hello, and welcome to my presentation about JMP in qualification and validation of biological assays. I've divided this presentation into five parts. At first, I want to give you a small introduction about my person and the company I'm working for, VelaLabs. The second part is a general introduction about method qualification and method validation like we perform it at VelaLabs often. The third part is how we collect and summarize the data. Then I will continue with the JMP data table where I've created some scripts to evaluate the data generated during qualification and validation. The last part, I will talk about some additional robustness parameters where different functions of JMP are used. My name is Alexander Gill, and I'm at VelaLabs since 2019. I'm a laboratory expert in the l igand binding assay group . I'm mostly responsible for method development, qualification, and validation for Biacore assays and ELI assays. VelaLabs is a contract laboratory for quality control u nder GMP conditions. We have four operational departments: the ligand binding assay group, the physico- chemical group, and the cell- based assay group, and the microbiological group. Method qualification and validation is important in the life cycle of pharmaceuticals and biologicals. Here, the life cycle of such drugs is shown from the pre- clinical phase over the clinical phases and the application. During the pre- clinical phase, developed methods are suitable which are on the scientifically sound. Afterwards, for the clinical trials phase 1 and phase 2, we use mostly qualified methods. For method qualification, we show with some suitable parameters the performance of the assay. If the assay is then validated, derived from the data generated during qualification, we create limits which must be reached during method validation. The validated method afterwards is used for clinical trials phase 3, new drug application, and also for batch release in post-marketing afterwards. Here, I've shown some examples for the performance parameters. The accuracy shows if the method has any bias or shift, or especially it lacks bias or shift/ the intermediate precision is the variability between runs where we show that different operators and different devices on different days do not influence the result. The repeatability is the variability within one run where we try to keep the differences between the reported values as small as possible. The linearity shows the dose response of the assay over the whole assay range. During the robustness, we show that different parameters can or cannot influence the result. For example, different ligand lots or different models of devices. Then the sensitivity to detect stability- indicating changes, there we use mostly stress samples to show that they can be easily distinguished to non- stress samples. Specificity is, for example, a blank subtraction or positive or negative controls. The data collection is mostly performed in Microsoft Excel because it's more accessible within our company. I will come later to this. We also collect the reported value, which is the final outcome of the assay. The reported value is calculated using a validated software like PLA, SoftMax Pro, or the Biacore Software. This is to ensure the data integrity. Every step where a human is involved in the evaluation has to be checked by a second operator. As I use a relative potency assay as example for this presentation, I've also shown here what's the reported value for this assay. It's the relative potency with the 95 % confidence interval as a quality parameter. Here are the reasons why we use Microsoft Excel for the data collection because it's available on every PC within our company and every employee has basic knowledge about it. The raw data from the validated softwares are also often exported in Excel. What is really important that the data in Excel are organized in datasets, so they can be transferred to JMP more easily. Here is a basic experimental design for a method qualification or validation. The first six runs are basically designed around the intermediate precision where we use 50%, 100%, and 200 % sample mimics in each of these six runs. These runs are spread above two devices, two operators, and performed on three different days. We report the mean relative potency for each of these dosage points, the standard deviation, the coefficient of variation, and the 95 % confidence interval. For accuracy, we use the same dataset as for intermediate precision, but we calculate the mean recovery, and therefore standard deviation, C V, and 95 % confidence interval both for all 18 datasets together and also for each dosage point separate. The seventh run is for the determination of repeatability where we use six 100 % sample mimics within one run and also report the mean relative potency, standard deviations, CV, and the 95 % confidence interval. Then for linearity, which is here in run 1, we use the sample mimics for intermediate precision and additionally use 75 % and 150 % sample mimic within this one run to show that the results are linear over the whole assay range. Therefore, we report the correlation coefficient, the slope, Y-intercept, and residual sum of squares. For robustness, in this case, we show a lower and a higher immobilization level and also use two different lots of the ligand. Then now, I'll show you the Excel table where we can see here in the first few columns the metadata for each data set, then the reported value with the 95 % confidence interval, the slope ratio, which is additional quality parameter and shows afterwards if the analyte is comparable to the reference. The column for recovery is empty because the recovery will be calculated in the JMP software. Here, the matrix where it's defined which datasets are used for which parameters. Then there are two different possibilities to transfer this data into the JMP software. One is with this function where a data table can directly be created out of this table. But in this case, I won't use this function because I have already created a JMP table with all the scripts I need. I just copy all the data. But for this procedure, it's important to show all available digits of the reported values because only the shown digits are pasted afterwards into the JMP software. I now copy with CTRL+ C all this data and then go to the JMP data table where I can paste all this data. Then we get here an alert because in the column Recovery, I created a formula to calculate the recovery. I don't want to paste the data in here, but the Excel table does not contain data in this column. We click Okay, and everything is pasted as we wanted. For what purposes JMP can be used under GMP conditions? We use it during the method development phase for design of experiments, for example, to investigate more different parameters of the method within one set of experiments. Then use it for the statistical data analysis and also for comparability studies. For example , if a customer wants to compare a biosimilar with the originator. During qualification and validation, JMP can also be used for the design of experiments. For example, for the intermediate precision parameters or to spread the robustness parameters over the qualification runs. Then I will show afterwards for the determination of assay performance in qualification and for the check of the assay performance during validation. But for this, an additional QC check is required afterwards if all the calculations are performed in the right way. This is very important that JMP is not really usable for the determination of reported values. Therefore, as I mentioned before, we used mostly validated softwares. Now we go to the JMP data table where I will first show you how I create most of the script. Therefore, I use distribution. For example, if I create the accuracy at 50 %, I select the Recovery and choose it for the Y columns. Then I click Okay. Then we have here all available datasets. To limit these datas ets, I create a local data filter and use Accuracy and edit. If it then choose all the columns indicated with an X, we have reduced the data sets to 18. To reduce it further for only the 50 % sample mimics, I add with the AND function an additional filter for the nominal potency, which I then limit to the sample mimics with about 50 % nominal potency. Then you see we have only six datasets left with mean recovery of 99 % and a coefficient of variation of about 6 % and the confidence interval. To save this script, I go again to the red triangle here and save the script to data table. For example, as accuracy 50 % 2, because I've already created a similar script here. The difference for the intermediate position, if we open, for example, here the intermediate position at 100 % is only that we not use the recovery here, but the relative potency and have also again the same parameters reported. For repeatability, w e choose only one run with the six 100 % sample mimics. We report also the same data like the mean relative potency, the standard deviation, the 95 % confidence interval, and also the coefficient of variation. What's also very interesting here is the linearity where we use a different function. I created this using a Y by X plot and plotted the relative potency by the nominal potency and created a linear fit through all these data points. Then we report the Y-intercept, the slope of the linear fit, the RS quare or coefficient of correlation, and also the sum of squares error or residual sum of squares. Then we go back to the presentation. For additional robustness parameters, we, for example, show the performance of the assay using different material lots. For them, we show if they have equal variances. If the variances are equal, we use the T-t est. If not, we use the Welch- test. For example, for ELISA methods, w e also measure sometimes the plates on two different models of plate readers to show if both models can be used. This is then analyzed using a paired T- test. At the end, I want to thank you for your attention. If you have any further questions, you can type it into the Q&A or contact me directly.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

The new Workflow Builder introduced in JMP 17 is a great time saver for automating a fixed set of tasks. But wouldn’t it be great to create a workflow you could use on any data table? With Workflow Builder’s Reference Manager, you can! In this presentation, we will show how to record a workflow using a specific data table, then modify it so it can be used on any other table. Using this technique, you can build a custom tool with minimal JSL coding and share it with your colleagues. Hello, my name is Michael Hecht, and I am here to talk today about the workflow builder, which is a new feature in JMP 17 that allows you to record operations that you do within JMP and then play them back to recreate those operations. If you attended the plenary session this morning, then you saw Mandy Chamber's demo of workflow builder, which gave a good overview. Mandy also gave a talk this morning that goes into more detail on the workflow builder user interface. I'm going to start talking about a more advanced feature of the workflow builder called the reference manager. This is a feature that allows you to take a workflow and manage how references to data tables and columns within those tables are mapped and resolved. It allows you to make workflows that are more generic to be used with any data table. Let's get started. I'm going to start by creating a new workflow which is under File, New and there's New Workflow. You see the little tag that says this is a new item for version 17. When I choose this, I get this untitled workflow builder window. The panel in the center lists the steps of my workflow and as you can see, they're empty. I'm just going to click the record button, which is this button with the big red dot. You see it changes appearance to show that recording is in progress. Then I will do some operations in JMP and have them be recorded. First, I'll go I'm going to go to the File menu and I'm going to open up a data table. We see Big Class here opened and if we look in the workflow builder, we see that a step was added to do that same operation. Now I'm going to go to the Analyze menu and I'll do a Fit Y by X to do a one- way analysis using age as my X factor. For the response, I'm going to create a transform column by right clicking and choosing formula. I'm going to use weight divided by the height squared, which is the formula for body mass index. If we were using metric units of kilograms over centimeter squared, then that would be sufficient. But Big Class has its measurements in imperial units of inches and pounds. We have to multiply this by a scaling factor to get a standardized BMI. Now I have my formula correctly, I'm going to rename my transform column 'cause I don't want to use the default name, I'm going to call it BMI. Click okay. Click okay. Now BMI is in my list of columns that I can use to create the report. I'm going to add that as my response. Click okay. Here's my one- way analysis of variants. You notice, though, that it did not yet add this step to the workflow builder, but there is a little note at the top saying, hey, I see you launched a platform. A s soon as you finish with the analysis and close the window, I'll add it to the workflow. That's so the workflow builder can capture any changes you might make to the analysis. For example, I'm going to turn on means ANOVA here to get the means diamonds. Now when I close the window, you see the step gets added, report snapshot, Big Class, Fit Y by X of BMI by age. Great. I am done recording. I'm going to click the button again to stop recording. Then I'm going to turn, switch to presentation mode for this workflow. Like the tool tip says, that removes some of the editing controls. Namely it removes the record button so that I don't accidentally hit it again, and it takes away this activity log at the bottom. I'm going to rewind my workflow to the beginning, which closes windows that were associated with it. Then I'm going to click the Run button to replay it just to make sure it does what I wanted to do. I click Run and it opens Big Class. Then here's the analysis just like that we had it before. That's great. I want to look at these different steps in the workflow. You can see that behind each one is some JSL that shows up in the tool tips. But I can open up this step settings panel to see more details. I'll click on the first step and we see that there's some metadata information and then the JSL to run it, which is just an open command and there's a path name to the file. When I click on the second one, I see that we have JSL that sends a one- way message to this reference to the bigc lass. Jmp data table. Inside the one- way, we see references to weight and height and age. There's a reference to BMI, but BMI is computed as a transform column right there. That all looks good. But when I run this workflow, I don't always want to run it on Big Class. I might want to run it on a different data table. I'd like the user of the workflow to be able to choose that data table. I really don't need this open data table step at all. I'm going to select it and click the trash can icon here to remove it from the workflow. Well, now what's going to happen when I run my workflow? Let's give it a try. I'll click the Run button and it immediately prompts me to choose a data table. The data table Big Class, it can't find it anywhere. It's not in the list of tables that are currently open in JMP. It's asking me, do I want to go find it and open it or maybe open a different table? It says that anywhere that Big Class is referenced, it will use the table that I opened. It has a list here, but there's only one item in it . I'm going to select that and click okay. Now it's prompting me to go ahead and open a table. Well, I have Big Class right here, so I'm going to drag that in and click Open, and it runs just like before. Well, that's great. Let me rewind and run it again, but this time I want to choose a data table that's not Big Class. Let's see what happens. I'll click okay, and I want to use this table Football. Football is data about a college football team playing American football as opposed to rest of the world football. I'm going to open that and there it is. But you see, I'm immediately prompted, hey, this data table does not have a column named age. What would you like to use instead? I'm going to choose position and I'll click okay, and I get my one- way analysis. Notice that it did not prompt me for height and weight because those columns already exist in the data table, so it just uses them directly. T hese positions are now the categories that it's using for the one- way analysis, and they're all the abbreviations for different positions in American football. You have your defensive back, defensive lineman, full back, half back. I don't know what IB is, but kicker, offensive back, offensive lineman, quarterback, tight end. You see the wide receivers have a nice little group here with low BMI because they have to be fast. That's all cool. But I noticed in this football data table right below position, there's a second variable called position 2. I f we look at what that is, it's a different categorization of the data. Position divides the data into 11 categories, but position 2 divides it into 7. It might be interesting to run my workflow using position 2 for comparison. I'm going to rewind this and I'm going to run it again, but this time I'll choose position 2. Well, wait a minute, it didn't even give me a chance to choose the variable. It just went ahead and used position. In fact, it didn't ask me what data table to use. It decided Football was already open, so it could just use that. Somehow the workflow is remembering my choices from the previous run. If I want to choose a different variable, I have to somehow prevent that from happening. Let's take a look at this workflow. We see that there's an option on the red triangle menu that says References and has a sub menu, allow replacement of references. Well, that's already checked and the tool tip says it allows prompting for tables and column references that it can't find. Well, that's exactly what happened the first time we ran it. But then when we ran it again, it reused the replacements that already had. But here's the second option, Clear Replacements. The tool tip here says that it clears the previous replacement choices, which is what we want. Let's do that, and then we'll rewind this and run it again. Okay. Now it's prompting me for a data table again. Because Football is one of the tables already opened, it appears in the list here. I can just pick it, click okay, and now it's prompting me for age again. This is great. I can pick position 2 and click okay. You can see the seven categories that are defined by position two here. Okay, well, let's go back to position 1 now. Rewind this and run it again. Again, it didn't give me a chance to choose position 2. I don't really want to have to clear the replace ments every time that I run this workflow. What I need is some way to control which replace ments are remembered. I see this third option here that says Manage. P ool tip says it manages the table and column references that can be replaced at runtime. Let's see what we can do here. This brings up a window called the Reference Manager. At the top is this check box, allow replacement of references, which is pretty much the same as that first sub menu choice. Just like that, when it's checked. Then there's a button reset all replacement choices, which sounds the same as the clear replacement menu item. Then we have a list of table references. There's only one item in the list. That's because my workflow only has the one reference to Big Class. If this were a more complicated workflow that accessed multiple tables, then all the table references it used would appear in this list. You select one of them and then you have details that you can change. I can see that I can add a custom prompt rather than the big long prompt. I'm going to use something a little simpler. How about please choose a data table. I see that this mode is set to prompt if necessary. It's necessary when it can't find Big Class or it can't find what you see Big Class is mapped to here Football. But what I want is for it to prompt every time I run the workflow, so I want to choose each run. Then down here, we have a list of the column references that the workflow uses from this data table. We see BMI, which is the transform column. That should never be prompted before because we're computing it within our workflow, so I can change this one to never. You see age is remembering its mapping to position and it prompts if necessary. We want to change that to prompt on each run. I'm going to also give a custom prompt for it about to please select a category column. I'm going to copy that so I can reuse it. Height and weight are both referenced here and as we saw, it can pick those up automatically if they exist in the data table. That's good. I think if necessary, it's doing what we want. Let's leave it at that, but I'll give it a better prompt. We'll change this one. The same thing for weight, we'll change here. I think we have all of our settings the way we like it, but we still want to clear these mappings to the Football data table. I'm going to click button and hopefully, those will go away. All right, t hat's good. I'll click okay. Let's rewind this and try it again. It's letting me choose, and I can go back to position. You saw the new prompts in both of those dialogs, the custom prompts that I chose. That all works well. In fact, I can even... Let's see, let's go open our Big C lass table. If I run the workflow now, you'll notice that even though the workflow was originally recorded using Big Class, because I set the prompt to each run, it's prompting me now instead of just using Big Class directly. But I'm going to choose Big Class. Now, even though Big Class has an age column, it's asking me to select a category so I can choose something different like sex. Then you get one way of BMI by sex. This is all doing what we want. I am going to save this workflow from the File menu, Save, I'll give it a name of BMI and it automatically gets an extension of . jmpflow, J-M-P F-L-O-W. We'll save it to the desktop and there it is. You see it has this little org chart looking icon. If I get info on it from the finder, we can see that the hidden extension .jmpflow is right there. This is great. I could distribute this to my colleagues and they would open it on their installation of JMP. Then when they run it, it will prompt them for a data table, it'll prompt them for a category column, and it'll produce the report. That's fine. That's a great way to do it. I think I'm going to take it one step further, though. I would like to package this within an add- in. An add- in lets you customize items on the JMP toolbar and menu bar. For example, you can see I have an add- ins menu here already with some items in it, but I'm going to create a new add- in to put my own command there for BMI. Let's go to the File menu and we'll create a new add- in and this add -in builder dialog appears. Let's give it a name. We'll call it BMI Report. Add-ins need a unique identifier, which is just a string, but we use this reverse DN S system that you take your company's website, like ours is Jmp.com and reverse it, s o we'll use com.j mp. I am the only hashed at JMP, so I'll put that in here, and I'm going to give it something unique for this specific ad- in that I'm creating. We'll call it BMI-R eport. I'm going to select all that and copy it. This is version 1 of my add- in. As I said, the workflows are a new feature of JMP 17, so I'd like to set the minimum JMP version to 17. Unfortunately, it looks like we forgot to add that as a possibility to this menu. I'll just do the best I can and choose 16, and hopefully, we will get that corrected for JMP 17.1 coming next month. I want to add a menu item to the add- ins menu. I'll click add command and we'll name it BMI Report. I'll even give it a tool tip, create a one- way analysis of body mass index by the chosen category. That's pretty good. Now I do need to add some JSL here, but it's pretty simple. All I want to do is open bmi.j mp flow. However, I'd like to embed my BMI workflow within the add- in that I'm creating, which means I need to tell the open command that it comes from this add- in's home directory, which I can get to with the path variable dollar add- in_ home. I have to give it the add- in's unique ID as well in parentheses and put a slash there for the directory separator. That looks good. Now I need to actually add my workflow as a file embedded in my add- in. I'll go to additional files and add it there. That I believe is everything. Let's close this, save changes. It gives me a default name of BMI Report. We'll save it on my desktop. There it is. It should have this workflow embedded in it. I'm going to close it here and I'm going to put this workflow that I built in the trash, empty the trash. When we save an add- in, JMP automatically installs it as well. If I come over to JMP and look at my add- ins menu, now we have BMI Report. When I choose that, it should open the workflow that's embedded within the add- in. There it is. We can run it. We get our custom prompts before, we can choose a table, we can choose a category. I am going to redo the analysis here so that I have a copy of this that's not under the control of the workflow and will stay around even when I rewind the workflow. Let's rewind it. I'll run it again and let's choose position two. You can compare the two reports and let's see, we can look at things like, this O category under position two that corresponds to the full back, the half back and the quarterbacks, so it's the offense. L is the defensive line and the offensive line, so that's the linesman. Anyway, that pretty much concludes the items I want to cover in this talk. I direct you again to Mandy's presentation this morning, Navigating Your Data Workflow: Workflow Builder Grants Your Wishes for Data Cleanup, for a great overview of of the rest of the workflow builder UI. I believe at this time we are going to take live Q&A. Thank you very much.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

A frequent task in data analysis is aligning curves before a descriptive or root cause analysis. Often an additional complication occurs when the measurement intervals are not equidistant in the series to be compared. There is not one single value that quantifies the shift for a whole curve. Interpolation is the solution in cases like this. Simple linear interpolation may lead to numerous random errors; a spline interpolation is more robust. Since the Graph Builder exports the formula for the spline in its current shape, it became an easy, accessible tool for the alignment of curves. And as all steps can be programmed in JSL, it provides a framework for automating curve alignment. This presentation will describe the background, concept, and case study application for the alignment of curves. Welcome, everybody, to this presentation about a use case of curve alignment. Experienced analysts often say that in a larger analytical project, plus minus 60 % of the total time goes into the preparation of data. If curves play a role and especially the alignment of curve is needed, then that is certainly close to the truth. Curves are very specific types of data, and JMP has some tools to work with curves and to address all the related problems. In the sample library, there is the Algae Mitscherlich data, which is one of my favorite data sets with that respect because it has the option to deal with many aspects of fitting curves. This is just an example of the development of Algae density in different treatments. The type of curves that I'm going to talk about are typically observations or measurements over time. But this doesn't mean any loss in generality. The presented concepts work in all kinds of curve relationships. This is an example, A lgae measurement over time, and one of the aspects that is in the focus of the analysis for this data set is to specify curves, specific types of curves that have a known shape and are driven by certain parameters and then to estimate those parameters based on the data. In those cases, the parameters very often have a technical meaning like slope, inflection point, limit that gets approached. That platform here also has the sliders that let you analyze how changing one of those parameters affects the shape of the curve. In the specific case that we are going to talk about, we are not specifically interested in the curve. The curve itself is only a help because we are facing another problem. This is the data set, or this is a part of the data set that goes back to the real problem. We had this series of measurements, one and another series of measurements, and they belong to two different devices. U nfortunately, the clocks of these devices were not in sync. But luckily, each of the devices had one sensor that measured the same substance. W e could look for times where the measurements were very close to each other. Then try to find out how to correct one of the clocks, so to say, so that we get aligned measurements, and then use those to evaluate the data from all the centers that have been available in that data set. What was the problem of the task? You see the curves here. The red curve is the one that we took as the reference curve, and the blue one is the one that we wanted to shift. You see not only that the curves are quite some distance away, although they should theoretically have measured the same substance at the same time, but also the time points of each series is completely, or the time point of both series are completely unrelated. With the bare eye, we don't see any lag that we could use to correct one of the data sets. Therefore, I looked into… I compared the time points, not the Y measurements, the time points of the two series. If there was just a shift, then we would expect to see all the data, all the points exactly on one line. But here you see there are ups and downs, so this is obviously not the case. Perhaps we can see more, we can understand more if we calculate row by row in the data table, we calculate the difference of the two times and look at those. Here with some fantasy, we see a little bit of curvature, so till the end, it seems to be closer related than to the beginning. But in the beginning, this looks like real random data. This as well does not help us a lot to figure out how to relate the data. I thought I had the link to the data table, but we can look in this screenshot as well. This is the data set that you have seen before, a little bit annotated. We see that two lines have specific markers, the star and the circle. This is due to the fact that the whole measurement project had a ramp- up phase, and at the star point, the measurement series, the measurement time, the real process time started. The circle is there where we, after visual or manual inspection, saw the starting point in the second time series, and we want to align both. W e need to change the relationship of the rows, of the data and the rows. We want to shift one of the data sets, and that reminded me of the paternoster that I like to use when many years ago I was working for a company that had a very old administrative building and we had the paternoster in there. It came to me that the strategy that we are following is part of paternoster shift, which gives the word elevator pitch a completely new meaning, by the way. How do we find the right steps, the right place to fit? We do not have similar or identical time points in both series of times. We need to construct those time points somehow. Of course, this is done through interpolation. T he first thing that comes into mind is linear interpolation, and if I zoom in into only a part of the data set, then it is evident that linear interpolation, so just checking, so to say, the regression between neighboring time points has some problems, especially if we look into areas where we have horizontal lines, which may easily happen. Then the time point in that range is quite arbitrary. It always leads to the same results. The opposite is true. If we are in an area with a very steep ascent, then a little change on the X- axis or the time may lead to significant changes in the Y value. T his is not a very good technique. We can use splines to interpolate between the values. You know splines, certainly from the graph builder. If you make a scatter plot, then by default, the smoother is switched on and the smoother provides splines. You can even change the stiffness or the degree of fit or the closeness to the data with a slider in the graph builder. The advantage of using a spline as an interpolation tool, also takes into consideration points further away. T hey build a smooth curve. That is why in the graph builder, it's called the smoother. This makes it easier to use those for interpolation and to use these as well as a base in our alignment process. We need to fit splines. How can we do this or which platforms do help? First of all, simple tool is Fit Y by X comes into mind very fast when you work with JMP and do that data analysis. This is the data, one of the curves. There is the spline fit, and here is the slider that let me choose how close or how close I want to fit my data. Very good, very easy to use, and you can save the spline but only the values, not the formulas. We are keen on getting the formula for the spline. Next stop, Fit Model. If you have a continuous variable, you select it, you can give this the attribute of being a knotted spline effect. When you do so, you are prompted to say how many knots that spline should have, the more, the more flexible. I accept the default, say run. We get the typical report from Fit Model. A lso, we have fit models, functions of saving formulas. We can use Fit Model, save the formula. Little disadvantage here is I need to specify the number of knots before I start the analysis. Once the analysis is done within the platform, I don't have the option to play with it or change it like it is, for example, in Fit Y by X. Another tool is the Functional Data Explorer . T he Functional Data Explorer has splines as a core function, and it is also functionality to find optimal definitions, optimal fits for the splines. You can export everything. It's a bit because simple tasks like this is not where the Functional Data Explorer is made for. You need some more clicks to come to a result. A lso, it's only available for people who have JMP role. Remains, the Graph Builder. You have seen it before, and this time I want to show the spline control as well. As I said, we can use the slider to determine the fit. A very nice feature, by the way, is that you can check this box, then through a bootstrap sampling method, the confidence interval for the smoother is calculated or estimated. You see how that changes when I'm… Now you can see better, I have a lot of data and there is not too much variability. H ere the confidence band is quite small. But if we zoom into one of these areas here, for example, that place and look at what happens when I change this, then we see that the smoother can even… That the line of the smoother can even walk out of its own confidence band. This is another visual help to find out a good fit for the smoother, for the spline. It should stay within its own confidence limit. Then comes the very important option here. We can save the formula. Then we have a formula for this spline. The graph builder surprises as a modeling tool. Who had expected this? How does that help? This is again part of my data table, small part. You see that now I have two columns here where I saved the formulas for the smoother too. Down here in the colored rows, I put some arbitrary time points in. That leads to an interpolated response relative to the time point that I have given. It only works for interpolation. We cannot extrapolate this way, it's only with interpolation. But this way I can, for example, manually add different time points. I have this one here plus X seconds in that case , and then I can see what is the difference of the interpolated value. Now I can put reference times in and I see exactly what is the expected value, plus minus a little bit for both measurements. I did this for two different phases. I can go here and experiment anymore. In my journal, you see in the yellow rows, I added eight seconds. In the orange ones, 10 seconds. Depending on what you want to do, this is the principle of how you can work with this. If your task is a one- off task, this is good enough. You can go in here, play with the data, see the difference. Our task was more regular. The good thing is everything can be controlled with JSL. As usual, for many commands that you do manually on line, you have corresponding JSL statement, and I just listed some. This is not a working program. First of all, you need to set up the graph, clear. Then you have commands that you can send to the graph and specifically to the smoother element in your graph. We can change the smoother so we could even interactively try to determine good fits. We can also give the command to save the formula in the data table. That is the command that plays an important role for our solution here. You can read out the current settings of the Lambda slider and something small. How did we want to use this? The concept here was, of course, first you need to determine what is the reference curve and what is the objective curve, the one that needs to be shifted. Then you calculate the spline function for the reference curve and determine the direction of shift. Where are we? Do we need to shift our time up or down? Then we move the Y values of the objective curve, one row in the desired direction and calculate the spline function for this new curve. Save that, use a reference value, and then calculate the differences in Y for each row. Then we take the total sum of those differences as a criterion when to break the process. Because after every step, we calculate that difference, we save the difference, we do the next step, and we check, was there an improvement? If yes, we move up or down one row more, and then we repeat that whole activity until there is no improvement anymore. The whole program in the real project, of course, runs behind the scenes. You wouldn't see anything. But I added did some graphs to make it visual to demonstrate how that works step by step. The starting situation is this one. On the left- hand graph, you see the dashed line and the solid line. The dashed line is the reference line. The solid line needs to be moved. On the right side, you see the differences per row. In the beginning, the differences are… You see that here in the starting area, the differences here are pretty small. Then they get larger and larger, and they are negative. That is why it goes down here on a negative scale, very small differences in the beginning, and then they get up larger. This is the starting situation. You will see this picture again. Then the program will start shifting the reference curve one cell up in our situation, our case. Then you see how these graphs update for every step. Yes, first we need to tell JMP what are the time and measurement values for the reference and the objective curve. Here we go. It will take a little bit in the beginning, then afterwards the steps come faster. You see how for every step, the blue curve approaches the dotted curve, and how the differences decrease. The last step did not improve the situation anymore, therefore, the program stepped one step back. Now, we have the data table in a situation where we shifted up the objective curve. Now we can use this shift for all the other measurements, for all the other sensor results that we had for this device and start the analysis. That was it. I hope I could inspire you a little bit. It was an interesting presentation. If you have any questions, please don't hesitate to contact me. My email was on top of the presentation, bernd.heinen@stabero.c om. Thank you very much.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

Challenges with a JMP® and Python integration resulted in a search for an alternative solution that would allow for the evaluation and testing of the various Python libraries and powerful algorithms with JMP. This would enable JMP users to work with Python from a familiar JMP environment. After a few different iterations, a RestAPI service was developed, and when JMP calls this service, it dynamically creates a user interface based on the options the service currently provides. The JMP user can then utilize this user interface to employ different algorithms such as HDBSCAN, OPTICS, and UMAP by sending data directly from JMP in one click. After the algorithm has finished its operations on the server side, it will return data to JMP for further analysis and visualization. Welcome to the Pythonless Python Integration for JMP presented by Murata Finland. My name is Philip O'Leary. Shortly about Murata, we are a global leader in the design, manufacture, and supply of advanced electronic materials, leading- edge electronic components, and multifunctional high- density modules. Murata innovations can be found in a wide range of applications through mobile phones to home appliances, as well as from automotive applications to energy management systems and health care devices. We are a global company, and as of March 2022, there was approximately 77.5 thousand employees worldwide, just under 1,000 in Finland, where we are located. Our product line up here in Finland include accelerometers, inclinometers, gyroscopes, and acceleration and pressure sensors. Our main markets are the automotive, industrial, healthcare, and medical. Today, we have two presenters, myself, Philip O'Leary and my colleague, Jarmo Hirvonen . I've been working in the ASIC and MEMS industry for over 40 years, 32 of which have been here at Marata. I've had several roles here and have come to appreciate the importance of data within manufacturing. Most recent years have been devoted to supporting the organization take benefit from the vast amount of data found from within manufacturing. I currently lead Murata's data integration team. Jarmo, perhaps you'd like to give a few words on your background. Yes, sure. Hi, I'm Jarmo Hirvonen and I work in Philips' team as a data integration and data science specialist. I have been using JMP for four and a half years, approximately the same time that I have been working at Murata. I'm a self- learned programmer. I have been studying both programming besides a couple of basic courses at university. In my position, I do a lot of JSL scripting. I write adding scripts, reports, automatisation, basically almost everything you can script with JSL. If it stays mostly inside JMP. I'm active JMP community member. I'm also a super user there. B ecause due to my background with the JSL scripting, I'm also steering committee member in the community scripters club. I have also written, I think at the moment nine add- ins that have been published to JMP community. Feel free and try them out if you are interested in that. Thank you. Thank you, Jarmo. This is the outline for the presentation that we have for you today. As this session has been recorded, I will not read through the outline as you can do so yourselves afterwards. Why do you have the need for a JMP Python integration? Well, basically, we are very happy with the performance and the usage we have of JMP. It doesn't require any programming for the basic usage, and we see this as a big advantage. JMP's visualization and interactive capabilities are excellent. T he majority of people performing analysis at Murata in Finland are already using JMP. W e have a large group of people throughout the organization using JMP, and we want to maintain that. However, on the Python side, we see that Python has powerful algorithms that are not yet available in JMP. We already have people working with Python in various different applications, and we have models within Python. We want to support these people and also help others understand and take advantage of the Python world. B asically, we want to take advantage of the wide use of JMP here at MFI and offer JMP users access to some common Python capabilities without the need for themselves to program. I'll continue here. Share. JMP already has Python Integration, but why we are not using that? Basically, there are two groups of reasons, JMP and us or our team. My experience regarding JMP are from JMP 15 in this case. JMP update at least once broke this integration and it caused quite a few issues for us because we couldn't use the Python JMP scripts anymore unless we modified them quite heavily. Getting JMP to recognize different Python installations and libraries has been quite difficult, especially if you are trying to work on multiple different installations or computers. Also, JMP didn't, at least that then support virtual environments that are basically necessary for us. Then our team side, we don't have full control of Python versions that JMP users are using or the libraries and packages they are using. Because not everyone is using JMP as the main tool. They might be using Python and they have some versions that don't work with JMP and we don't want to mess with those installations. Also, in some cases, we might be running Python or library versions with JMP doesn't support yet, or maybe it doesn't support old versions anymore. What is our current solution for this Python JMP or JMP Python Integration? We are basically hosting Python server using a web framework. We can create endpoints to that server, which are basically, behind them, there are different algorithms. We communicate with Rest API between JMP and the server. This is the biggest benefit. This test we can use JMP with the server, but we also have a couple of additional benefits. We can have centralized computing power for intensive models. For example, we don't have to rely on the laptop to perform some heavy model calculations. The server is not just limited to JMP. We can also call the endpoints from Python or for example, R. We are not dependent on the JMP supported Python and library versions anymore. We can basically use whatever we want to. Next, I will go a little bit away from the PowerPoint to jump and show a little bit of the user interface. First, I will explain some terminology which might appear here and there on this presentation. W e have endpoints, basically, this path here is endpoints. These come directly from the server. Then we have methods. It's the last part of the endpoint, DSME and XT boost in these two here. Then we have parameters, this column, and this is basically the inputs that we will send to the server. Then we have what I call stack or we call stack. It's the collection of stack items. O ne row is the stack item that we can send one after another to the server. Quickly jump here. W hat features we have? We have easy to add new endpoints. Basically, we write at the end point to the Python server, we built the server, we ran the JMP add- in, and this list will get updated. This adding support dynamic data table list. I f I change the table here, it will update here. Also, if new table is opened, the other screen, but it doesn't really matter. You can see it here, the untitled3 tree was opened. Then we can send data directly from here to the server, but they're pressing multiple different options for sending. I can send these selections that I had here basically immediately. I will show the results here. After getting the data back, we join it. These are from the server. We join the data to the original data table we had and then we have some metadata we can get from the server between the from the communication. Notes, column, properties telling what method and parameters were used to get these two columns. Then we group them. I f I have run multiple models or methods, it's easier to see which are from which runs. Then we have table scripts which are also grouped. This is different screen, let's move them around. We have stack. What was sent? HTTP response from the sent that comes from the server. Then in this case, we also receive from the endpoint an image. In this case, it's a scatter plot from the t-SNE components. I said already earlier, we can send multiple items from the stack one after each other. You can build, let's say, HPP scan with different input parameters used in the, let's say, 20 here and then 20 to 40, add them to stack and just send them there and come back when they're done and you can start comparing if there are some difference between those. T hen endpoints have instructions how to use them. Documentation link, if we have one short description, in this case, very short description of the endpoint, and then what each of the parameters do. Minimum values, maximum values, default values, and descriptions of those. Then we also have user management. In this case, I'm logged in as a super user, so I can see these two here experimental endpoints that basic user would not be able to even see. Then back to PowerPoint. This may be a partial implementation, partially how the adding works. When the user runs the adding, the JMP will ping the server, and if the server is up and running, JMP will send new request for the JSON that we will use to build the interface. The JSON is passed and then the interface is built and it is using JMP- type classes that I will show a bit later. C ustom class is created in JMP. A t this point, users can start using the user interface. User fills the selections, parameters, data tables, and such, and then sends the item from the stack. We will get the columns based on the inputs, get the date that we need and convert that data to JSON. In this case, I call it column JSON because there's a demonstration. Basically, normal JSON would always have the column name duplicated. Each row will have all the column names here. I n this case, we will have column name only once and then list of values. This makes the object we send much smaller. Before we send the data, we will ping the server again. This is done because we have different timeouts for ping and the request. Otherwise, JMP will lock down for a long time if the server is not running and we are using two minutes timeout, for example. T hen when the server gets the data, it will run the analysis, return the analysis results, and we join them back table at the metadata table scripts and so on. A t this point, users can start to continue using JMP, send more items from the stack, or maybe even JMP to graph builder and start analyzing the data that he or she gets back from the server. T his is the JMP- type classes. W e have different classes for different type of data we get from the server. We have booleans. I n JMP, this is checkbox columns, enumerators, this would be combo box type number, TypeS tring, and not implement that. This is basically used to check that the server is correctly configurated. This is a quick demonstration of one of those Type Column. On server side, it has been configured like this. When we request the JSON, it will look more like this. Then this type column class will convert it into an object that will look in the user interface like this. From here you can see that for example, minimum items is one. It's the same as minimum here. Max items, same thing. Then modelling types have also been defined here. We can limit minimum, maximum values, and so on based on the schema we receive from the server. A ll of these are made by the custom JMP classes. T his is enumerator, some options, then number boxes, and here is the boolean. N ow, Phil, we'll continue with the couple of demonstrations of the Pueb interface. Thanks, J armo. All demonstration is done today will be performed using standard JMP 16. There are three demonstrations I'd like to go through, each having a different task in mind. The first one, I'll just open the data set. This is a data set which contains probe or test data from five different products. It's a rather small data table just to ensure that we don't get caught for time. W e have 29 probe parameters for five products within the same product family. T he task at hand is to try to determine quickly, do we have anomalies or do we have opportunities for improvement. Looking simultaneously at these five different products, 29 different parameters, such that we could identify something that could help reduce risk or something that perhaps could reduce cost and improve yield. O ne possible way to do this, of course, would be the one factor at a time whereby we would just manually march through all the different data, all the different parameters and look for patterns. Very inefficient for 29 parameters, it's okay, but some of our products have thousands of parameters, so it's not the best way to approach the task at hand. Another possibility would be to take all of these parameters and to put them through some clustering algorithm to see, could we find groups naturally from the data that we have? I want to use the JMP- PyAPI interface that we have here. Jarmo already explained briefly how these work, but I will demonstrate it. T he intention that I have now is to make a HDBSCAN . I'm going to make the scan on all the probe parameters. I'm going to use the default settings. Default settings are typically already quite good. And I'm going to send this... I'm not going to make a big stack. I'm going to send this setting straight for analysis. W e can see rather quickly, the algorithm came back and suggested that I have a cluster. There are actually three clusters and one grouping of wafers which do not, in fact, belong to any of the clusters. Knowing that I have five products, I'm going to go with this for the sake of demonstration. I can see from here a histogram of the number of wafers in each cluster, but it doesn't really give me a good visualization of what's going on. I'm going to also do a dimension- reduction procedure. I f I go back into the same interface, and now I'm going to do a teeth knee dimension reduction on the same parameters and send it immediately. Wait for the dimension reduction algorithm to do its job, and it will return back two components for teeth knee, one and two, against which then I can actually visualize the clusters that the HDBSCAN gave me such that if I now plot teacher 1, teacher 2, and colour code them in accordance with the clusters that have already been identified. As I said, we have three clusters and one grouping of wafers which don't necessarily belong to a cluster. Maybe somewhat disappointing knowing that I have five different products. T hankfully, I have an indicator of the product. It's here. I said, this is actually frustrating because now I have two different products being clustered as being the same. I n actual fact, this is the medical application of the same automotive part. I n fact, the parts are identical so them being in the same cluster is not a problem. This part is rather unique. It's different to the other products in the same family, such that it got its own cluster with a few exceptions, so it's quite good. T hen the B2 and the B4 versions basically have the same design. W hat I'm concerned is that the B4 has been allocated a cluster 1 and also a lot of minus ones for wafers in the same product type. I'd like to further investigate what this might be due to so that I have scripted to the table, I want to make a subset of this SENSORTYPE NR SA AB4, and then I'm going to plot the differences for every parameter by cluster minus one and cluster one. H ere we see the parameters in question, and the biggest differences are observed for Orbot 1 and Orbot 2. I'm not going to get into the parameters themselves, but just suffice to say that some parameter differences are bigger than others. Now that I know that these exist, I'd like to check across all the wafers in this subset, how does Orbot 1 and Orbot 2 actually look? H ere we see, in fact, that the ones which have been allocated minus 1 are not belonging to the cluster itself have a much higher value of Orbot 1. In fact, this anomaly is a positive thing, because the Orbot value, the higher it is, the better. W e see that there's quite a large group of wafers having exceedingly larger values of Orbot than what we would typically see. T he next step, of course, would be then to do a commonality study to figure out how has this happened, where have the wafers been, what has the process been like, and look for an explanation. Well, we can see that very quickly, a multi product, multi- parameter evaluation of outliers or anomalies can be very quickly performed using this method. I will now move on to the second demonstration. Just need to open up another file. T his application is very different. I t's very much trying like... Or actually, it is a collection of functional data. In fact, there are bond curves, curves which occur in our anodic bonding process when we apply temperature, pressure, voltage across a wafer stack to have the wafer, the glass and the silicon bond together. If we look at individual wafer curves, we can see that each wafer has a similar but still unique curve associated with it. We can see the bonding process time and the associated current. T he task I would like to... The goal I would have, if I just remove the filter, I would like to know, without having to look through, in this case, 352, but we would have thousands of these every week, how many different types of curves do I actually have in my process? T hen tying that in with the final test data, can this curve be used to indicate a quality level at the end of the line? In order to do this, I'm going to split the data set. N ow I put the time axis across the top and the current through each column. The first thing that I do after doing this splitting then is to again go back to our PyAPI interface and I'm going to look at Split Data. W hat I want to do is to make a dimension reduction because you can see that I have many, many columns, and it would be much better that I can reduce the dimension here. A gain, I'm going to do a teach- me analysis. I'm going to send it straight to the server, and we can see that the algorithm has come back with two components. I can demonstrate them very quickly what they look like. T he 352 wafers which were represented by functional data, curve type data a few minutes ago are now represented using a single point for each wafer. Now, having reduced the dimension of the data, I'd like to perform a cluster analysis next. A gain, I'll go back to my AyAPI. I'm now going to do a HDBSCAN on the titanium components. I just need to check on this analysis what would be a suitable level. If I send it immediately, I get, colour code, the cluster, you can see that. Now clusters have been allocated to the teach-me components. This is the first level analysis using the teach-me, sorry, using the HDB defaults, I could, of course, try another setting. I could perhaps run, maybe, if we think out loud, 25 wafers, a batch of wafers, and half- wafer batches are things that would be of interest to me, and look to see what would this cluster now look like. N ow all of a sudden, I have much more clusters. O f course, it does take some subject matter expertise. You need to know what clusters you would expect. In this case, I said, okay, a natural rational group for us within the manufacturing would be a bunch of wafers, a lot of wafers, wafers are posted in 25 wafer batches. S ometimes we have halfway for batches, which we do experimental runs on and so on and so forth. N ow we can see that we have clusters associated with the different types of curves. I'm going to shorten this demonstration rather than you watching me do joins and so on and so forth. W hat I'm going to do is I'm going to take from the original data, I'm going to put this cluster into the original data. I t's of course, opening on another screen. I f I do cluster overlays, we can see... T his is the original data where at first I showed you each individual wafer bond curve. Now we can see that we were able to identify the distinct differences between seven clusters and one group of wafers which don't belong to any particular tester. W e can see that very quickly, we've been able to go through large numbers of wafers, determine similarities between them, and come up with clusters. If you bring this even one step further, we can take a look at the actual teach-me components, the coloured clusters, and have a quick look at what do the actual contents... W e can see this is cluster minus one. They seemingly have something which has a very high bond current at the very beginning, cluster zero, very high bond current at the end. Y ou can see that if we were to spend enough time on this, you would see lots of similarity between bond curves within each cluster. A short demonstration on how to take functional data from hundreds of wafers, cluster them, and with them the various visualization techniques within JMP, how to clearly identify and present so that people understand the different groupings that exist within the data sets. This concludes my demonstration number two. I have one more demonstration. This is maybe in some respects, for some, maybe a fun demonstration, so that longer to take... Again, it's not a real wafer, but I'm playing with the idea that I have a silicon wafer and there are some noise. This is a defect layout from an automated an inspection tool, this data has been simulated. The purpose of having this simulation is to look for scratches or patterns found from defect data layout. This is rather easy and straightforward if I don't have noise. I can see that there's noise associated with this data set. W hat I want to determine is, can I find a way to identify these three spirals, assuming that they simulate some scratch. In fact, they're not very similar to a scratch, except they are patterns having high- density defects in a small area. T hat's the main purpose of using it, rather than showing you actual wafer automated visual inspection data. The idea is that the task at hand is try to identify the spirals from this data set. I'm going to use, again, a trust ring method. A gain, it will be... The table I will use spiral data with noise. As Jarmo pointed out, we can run because if I don't know, obviously putting the number of wafers in here, 25 and 12 won't help me because I'm looking at a single wafer. T he numbers I put in should be somehow representative of how many defects are typically seen within a scratch and what are the smaller sample sizes associated with clusters and so on and so forth, minimum samples. Being a complete novice, I don't know. I'm going to put in some numbers to play with. Twenty five would be minimum cluster size with a minimum sample size of zero. Add to stack, and then I say, Okay, well, this is rather inexpensive to do so I'm going to add... You're missing the columns. Oh, sorry. Thank you. This will help. Let me clear stack in my enthusiasm to move forward. I did not include what I should have. L et me start again. Thank you. Twenty five minimum cluster size, minimum sample size, add to stack. Fifty minimum cluster, add to stack. Seventy five. I'm allowing the scratches to be bigger and bigger. Add to stack, 100. Are not necessarily bigger and bigger, but they would have more and more defects associated with them. Add to stack. And then I'm going to add another combination of 75 too, add to stack. I could just take one of these and run it. I could select one and run, but I'm not. I'm going to be greedy. I'm going to run the whole stack at the same time. I'm going to run one, two, three, four, five cluster analysis against the data that I've represented, I've taken it from this wafer. I send the whole stack, and cluster, something has gone wrong. All my clusters are showing minus ones. Let me try this again. To make a long story short, and also the fact that this is being recorded and we don't want to start again from the beginning. I know that at the end, that if I take this... I'm not sure why this has disappeared, but let me try it one more time. The table I need is the noise table. I'm taking HDBSCAN X, Y features, X, Y, 20. I'm going to make a shortcut, 75 and two, send immediately. Now, thankfully, I don't know whether I had selected incorrectly last time, the table or whatever. Now that we're here, put up a few thumbies, send it immediately, and so on. As I said, we could have run quite many. The idea then is to look then at the layout and try to determine. I s it with this particular setup, finding good clusters and it's a minus one? I t says, no, you're not finding anything there. T hen if I colour code by the other clusters, it has in fact found quite well lots of points that don't belong to any cluster. T hen three individual spirals which are very well identified. Y ou think, what's the benefit of this? Well, now that I know what typical scratch content looks like, then I could in fact, then open up another wafer. If I open up data from another wafer, make the plot of the layout, we can see that there are no scratches on this wafer, it's only noise. W hat would happen then if I run the same setup? My wafer is another wafer. I'm doing it on X, Y. I'm looking to determine based on my best settings of how I should be able to find scratches 75 and two, send immediately and plot with clusters. We only have minus ones, so nothing has been detected has been a scratch. H aving this possibility to be able to run this algorithm against wafers on the database, then I could make a collection of wafers that have scratches, don't have scratches, or spirals in this case, and then use that data for an input to a commonality study to try and determine which machines in the production line are coming, are resulting in the scratches on the wafers. This concludes the third demonstration. Now I'll hand it back to Jarmo. I'll take that. W e have a couple more slides to left. Here is a couple of ideas we have for possible future development using DoE approach for the stack building, basically what Philip did by hand, but used DOE, so I had middle max values and so on, and then sent that whole stack. Then metadata viewer, so you can compare the results, try JMP 17's new multiple HTTP requests, local server, so we don't rely on the server being up. Try the new hopefully updated native JMP Python Integration. This would allow us to have faster data transfer, possibly the more... W e could start testing with this application, then try, for example, running from graph builder, we could trigger the functions, combining different endpoints. F irst, we could input the data from t-SNE based on the input that data and then automatically cluster the t-SNE . T hen, of course, we're always adding new endpoints if we find out what we want to have. Last slide is that we will be sharing small sample of the code. There will be a JMP file with the JMP script, Python script, and installation instructions there. Y ou can try to be quite simple user interface which will send data to local server and you will get the data back. It also has some ideas in the instructions sheet that you can try to implement if you're interested in trying this approach for the JMP Python Integration. That's for us. Thank you. Thank you also from me. If you need to contact us, you can do so via the community.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

Autonomous vehicles, or self-driving cars, no longer only live in science fiction. Engineers and scientists are making them a reality. Their reliability concerns, or more importantly, safety concerns, have been crucial to their commercial success. Can we trust autonomous vehicles? Do we have the information to make this decision? In this talk, we investigate the reliability of autonomous vehicles (AVs) produced by four leading manufacturers by analyzing the publicly available data that have been submitted to the California DMV AV testing program. We will assess the quality of the data, evaluate the amount of information contained in the data, analyze the data in various ways, and eventually attempt to draw some conclusions from what we have learned in the process. We will show how we utilized various tools in JMP® in this study, including processing the raw data, establishing assumptions and limitations of the data, fitting different reliability models, and finally selecting appropriate models to draw conclusions. The limitations of the data include both quality and quantity. As such, our results might be far from conclusive, but we can still gain important insights with proper statistical methodologies. Link to CA DMV disengagement reports Link to AV Recurrent Events Paper Hello, my name is Caleb King. I'm a developer in the DoE and reliability group at JMP. Today I figured I'd showcase what I think is a bit of an overlooked platform in the reliability suite of analysis tools, and that's the reliability growth platform. I thought I'd do that in the context of something that's become pretty popular nowadays, and that's autonomous vehicles. They're fast becoming a reality, not so much science fiction anymore. We have a lot of companies working on extensive testing of these vehicles. It's nice to test these vehicles on maybe a nice track at your lab or something like that. But nothing beats real actual road testing, which is why early in the 2010s, the state of California's Department of Motor Vehicles actually put together a testing program that allowed these companies to test their vehicles on roads within the state. Now, as part of that agreement, each company was required to submit an annual report which would detail out any type of disengagement incidents. Or heaven forbid, any crashes that happened involving their autonomous vehicles. Those had to be reported to the Department of Motor Vehicles, the DMV. Now, one benefit of the DMV being a federal institution is that these reports are actually available upon request. In fact, we can go there right now to the site and you'll see that you can at least access the most recent reports. We have the 2021 reports. They're still compiling the 2022. You could also, if you want, email. If you wanted some previous ones, I did that with a brief justification of what I was doing. They were pretty quick to respond. Now, we have different types of reports and different types of testing. We're we're focusing on testing where there is a driver in the vehicle and the driver can take over as necessary. This isn't a fully autonomous vehicle. You do have to be in the driver's seat to do this. We're using these disengagement events as a proxy for assessing the reliability of the vehicles. Obviously, we don't have access to the software in these vehicles. If you worked at those companies, you could probably have more information. We obviously don't. But they're a proxy because if you want our vehicle to be reliable, that means it needs to be operating as you intend within the environment. Any time you have to take over the AI for some reason, that could be a sign, "I t's not exactly operating as I intended." We can use it as a bit of a proxy. Again, it's not the best approximation, but it's still pretty good. Of course, I'm not the first one to think of this. This is actually an informal extension of some work I've done recently with my advisor, Yili Hong, and a bunch of other co-authors where we actually looked at this type of data from a recurrent events perspective. I'm going to do a slightly different approach here. But there is a preprint of this article available if you want to check it out that does something similar. Let me go in and describe the data you for you real quick. I'm not going to be looking at every company doing testing. There's so many out there. I'm going to focus on one, and those would be events submitted by Waymo, which was Google's self driving car project. Now they're their own subsidiary entity. These are their annual reports. Let me define what we mean by disengagement events. I'm in the driver's seat and if something's happening, I'm in autonomous mode and I need to take over and take over driving. That's a disengagement event. I disengage from autonomous mode. That could be for any reason. They, of course, need to report what that reason was. We're just using that as our proxy measure here. These annual reports are going to go all the way back to about 2015, 2014. That's when Waymo started participating in this program. The 2015 report actually contains data back to 2014. They start in the middle there. Each report essentially covers the range from December of the previous year to November of the current year. The 2016 report would contain data from December 2015 up to November of 2016. That way, they have a month to process the previous year's numbers. There are two primary sources of data we're looking at in each report. The first one is going to list each incident that occurred, when it happened that could be as detailed as day and time, or it could just be the month. Again, there's not a lot of good consistency across years. It's something we ran into. But they at least give some indication of when it happened. They might say where it happened and they can describe what happened. It could be very detailed or it could just be falling into a particular category that they give. Then the second part of data is going to list the the VIN or partial VIN of the vehicle, so the vehicle identification number. Something to identify the vehicle and how many autonomous miles that vehicle is driven that month. You might see later on when I show this data, there might be a bunch of zeros. Zero just means I either didn't drive that vehicle or I just didn't drive it in autonomous mode. In either case, I was not doing active testing of the autonomous mode of the vehicle. Now, as I mentioned earlier, there was a bit of inconsistency. Prior to 2018, when they listed the disengagement events, they actually don't give the benefit of the vehicle. We don't know what vehicle was involved. We know how many autonomous miles it drove that month, but we have no idea what vehicle was involved. Starting in 2018, that information is now available. Now we can match vehicle to the incident, which means when we do this analysis, I'm going to do it at two different levels. One is at an aggregate level where I'm going to be looking at each month all of the vehicles being tested at that time. Then looking at the incident rates overall in an aggregate measure. The second will be then I will zoom in at the vehicle level. I'll look at it by VIN. For that data, I'll only be going back to 2018. For the aggregate level, I can take all of it. Now, before we get through the analysis, actually wanted to show you some tools that JMP has available that allowed us to quickly process and accumulate this data. Again, to show you how easy it is and show off a few features in JMP. Some of them are really new, some of them have been around for a little while. Let me start by showing you one thing that helped us, and that was being able to read in data from PDFs. Prior to 2018, a lot of these data were compiled in PDFs. Afterwards, they put them in an Excel file, which made it a lot easier to just copy and paste into a JMP table. But for those PDFs, how did we handle that? Let me give you an example using data from 2017. This is actually one of the best formatted reports we see from companies. Some summaries here, some tables here and there. This in appendix A is the data I'm looking at. You can see here, this is the disengagement events. We have a cause, usually just a category here. They have the day, which is actually the month. A bit of a discrepancy there, the location and type. But this is basically just telling us how many disengagement events happen each month. Then we have a table here or a series of tables actually here at the back. This is showing us for each vehicle, in this case, it only gives partial event information. There is not a lot of information available in these early reports and then showing you how many autonomous miles were driven each month. How can we put this into JMP? Well, I could just copy and paste, but that's a bit tedious. We can do better than that. Let me come here. I'm going to go to my File, I'm going to go to Open. There's my PDF. I'm going to click Open and JMP has a PDF import wizard. Awesome. Now what it's going to do is it's going to go through and look at each page and identify whatever tables it finds there. It's going to categorize them by the page and what the table is on that page. Of course, when you save it out, you can, of course, change the name. Now, I don't want every table on every page. What I'm going to do is I'm going to go to this red triangle on this page and just say, "Ignore all the tables on this page. I don't want these." I'll say, ""Okay," I'll do the same here. It's a nice summary table, but it's not what I want. Then I start saying, " This is the data I want." Now, we're going to notice here, this is formatted pretty well. It's gotten the data I want. If I scroll to the next one, this is technically a continuation of the table from before. However, by default, JMP is going to assume that every table on each page is its own entity. What I can do to tell JMP that actually this is just a continuation is to go to the table here on the page, click the red triangle and say for the number of rows to use as header, there actually are none. This is a way to tell JMP that actually that's a continuation of the previous table. We'll check in the data here, and it looks like it did it now. I'm going to check here at the bottom and I noticed, "Oh, I missed that October data. That's okay. I'm going to do a quick little stretch there and boom, I got it." That's okay. You can manipulate the tables. I f it didn't catch something, you can stretch and manipulate the table to adjust it. You can also add tables it didn't find. In this case, I missed this. That's okay. I'm going to drag a box around it. Boom. There's a new table for you, JMP. I'm going to go in here. It's going to assume that there are some header rows. Actually, there are none. Okay, great. Now it's captured that part of the data. There's a bit of an empty cell here. That's just a formatting error because this is technically two lines, so they didn't put this at the top. It's okay. Easy fix on the back end. Now for these tables, what we notice if we go to that is it actually thinks, "Well, this is actually one table." Unfortunately, it's technically not correct because there are two tables, but it's an easy fix. I can simply go to each one and say, "It's actually not a continuation JMP. This actually has its own header." It says, "Okay," and you can do that for each of these tables. I won't do it for all of them. I'm just doing this to illustrate. What we'd have to do is we'd probably end up with a bunch of tables here where we'll have to horizontally concatenate. That's just the way they decide to format in the report. But JMP has a lot of tools to help us with concatenating and putting tables together. But you can see this is a lot easier than trying to copy and paste this into JMP, making sure that the formatting is all good. JMP is going to do a lot of that for us. Okay, another helpful tool that came out recently in JMP 2017 is, if you've probably heard of it, the JMP workflow. That was super helpful because obviously we have multiple reports over multiple years. We'd like to at least combine across all the years into two reports, one with the disengagement events, one with the mileage. What we did is we created an initial... We followed some steps to set up the table in a way that we can then concatenate them together into one table, and then we saved it into a workflow. That's what I have demonstrated here. This is a workflow builder that we put together for that. I'm going to demonstrate it using this data set. This is particular for our mileage. What we have here is a table. This represents what we would have a raw output from one of the reports. Here we, of course, have it broken down by VIN number. We've got a lot of information here. We'd like to reformat this table. First thing I'm going to do, I'm just going to walk through each step. I'm not going to show too many details in each step. You'll see what they are. It's pretty self explanatory. This first one is going to change the name of this column to vehicle. That way it matches a column in our concatenated table. I'm going to go over, I'm going to delete this total column. I don't need that. Then I'm going to do a stack across all the dates. You can see I've got that here. We conveniently called it stacked table, very informative. Now, one thing I need to do here, I put a pause here, that's the little stop sign. In. That's because I would usually need to go in and change the year. Now, something I could do right now, I couldn't really figure out a way to get a variable, say year, that you could just fill out, put the year there, and then it automatically fill it in here. That's maybe something I can go to community.jmp.com , go onto the wish list and say, "Hey, it'd be nice if I could do this." But for right now, I just put in the years. It was pretty easy to do compared to doing this multiple times. Pretty straightforward. But again, I can also highlight for you how you can actually go in and adjust the script itself. You can go in and tailor this to your needs. What this is going to do is recode these so it shows the month and the year. I'll do that real quick. There we are. The next step is going to take this. Right now, this is a category, it's a string, I want a number. That's what I do next. Now, this isn't pretty. This is just the number of seconds since some date in 1900, I believe. Obviously, that's not pretty. I'd like to show something more informative. That's what I do in the next step. Now, it shows the month and the year. I'm going to stop here. I'm not going to continue because at this point I'd have another table open. This next step would then concatenate the tables and then close off these intermediate tables. What I'm going to do is I'm going to reset stuff. I'll reset, click here. I'm going to reopen this table. I'm going to do this just so you can see how fast this goes. Here I'm going to click over here, I'm going to click Play, I'm going to click Play again. Look how fast that was. Now, imagine doing this for multiple reports. How much faster that is than repeating the same steps over and over and over again. This workflow was really helpful in this situation. Now, I'm going to close all these out because it's time to get into the analysis. Let's do that. I'm going to start with the aggregate level data. Here's my table. I compiled across all the data, all the time periods. We have the month, we have how many disengagement happened in that month. I got a column here for the cumulative total. I've got here how many autonomous models were driven. I got two columns here that I'm going to talk about in just a second. You'll have to just wait. What I'm going to do is I'm going to go in, I'm going to go to Analyze. I'm going to go under Reliability and Survival, and then I'm going to scroll all the way down until I reach Reliability Growth. I'll click that. Now we have multiple tabs here. I'm only going to focus on these first two because these last two concern if I have multiple systems. I'll revisit those when we get to the actual vehicle information. For right now, let's pretend that we're looking at the whole AV system, the artificial intelligence system in these vehicles. Think of it as one big system. There are two ways that I can assess this. One, I can do as time to event, essentially how many months until a certain event happened or days if we had that. Or I could do it via a particular time stamp. Basically, what was the time at which it occurred? I do have that type of formatted data. I have it at the month. The month is a fine timestamp. It just says in that month I had this many events happen. That's all I need to put in. I have all the data I need. I'll click OK. Now, great thing about this is before you do any analysis, you should, of course, look at your data, visualize your data. It's nice because the first thing it does is it visualizes the data for you. Let's look at it. One thing we're looking at, we're looking at cumulative events over time. What we expect is a behavior where early on we might have what I'll call a burn in type period where I have a lot of events and it's happening. I'm tweaking the system, helping fix it, helping improve it. Then ultimately, what I'd like to see is this plateau. I'd like it to increase and then flatten off. That tells me that my number of incidents is decreasing. If it goes completely flat, that's great. I have no more incidents whatsoever. I wish the were like that, it is not. But we can see patterns here in the data. Let's walk through. We have a burn in period here, early 2015, and then about mid 2016, we flatten off until about here. We see a little blip, about summer of 2016, something happens. We get a few more incidents. We level off again until we get to about here, about late spring of 2018. Something else happened because we start going up again. They're not very steep. This one's a bit longer. Then we pretty much at here, we almost flattened out. We've reached the period where we're really having no incidents happen, essentially, till the end of 2020. Then something happens in 2021 and where we've reached essentially another burn in period, something's going on. Essentially what we've got is four phases, if you will, happening in the growth of the system. Something's changed two or three times to impact the reliability. Another way to visualize this, I'll run this plot. This uses some data. I'm plotting again the cumulative total. I'm also plotting something what I call the empirical mean time between failures. It's a very simple metric to compute. It's just the inverse of the number of disengagements. It is a very ad hoc, naive way to try and estimate the mean time between incidents. But I plotted here so that you can see, you'll notice these four peaks that correspond to the bend in the curve. But there are four of them indicating these four places where something has changed in the system to affect its reliability. What we can do then is try to figure out, what are those breakpoints? One way you could do that is the reliability growth platform has a way to fit a certain model. I'll pause here to talk about the model a bit. All of these are actually the same model with slight modifications. They're all what we call a non-homogeneous poisson process. That is a fancy way to describe a counting process. I'm counting something, but the rate at which the accounts might occur per unit time is changing over time. A Poisson process just means that at a constant rate, so the rate at which incidents would occur would to be constant, that would be equivalent to seeing a straight line. It's very easy to model, but it's bad for reality because obviously we don't want the rate to stay the same. We would like it to decrease to essentially zero. That's why we have a non-homogeneous poisson process. We want it to change over time. Here we have a model where we can actually let JMP try and figure out a change point in the process. If I run it, what it's going to do is it's actually going to catch this big piece and say, "Hey, something really changed there. For most of it, it was the same thing, but after this point, it really changed." Now here it's only going to change one at a time. I have talked to the developer about, wouldn't it be nice if we could identify multiple change points? A pparently that's a bit of an open research problem, so me and him might be working together to try and figure that out. But what I did is I essentially eyeballed it and said, "I think there are certain phases. I think there's about three or four phases, and I did it empirically, which is where you get this column." I'm going to run that script. Let me show you how I did it. I come here under redo, go to relaunch, and all I did was I added the phase column here. This tells you that there are different periods where the reliability might have changed significantly, excuse me, in some way. If we think of that, we're going to look at the key metric here as the mean time between failure. We're going to see early on, so this is in months, this here is about three days, 4- 5, about a week, and this is about a day, day and a half. Early on, we have a bit of a low time. It's pretty frequent. We can also look here, I'll show you the intensity plot. That might be another thing to interpret. What we're looking for is we'd like the mean time between failures to be long. We'd like it to be a long time between incidents, ideally infinite. That means nothing ever happens, and our intensity to decrease. What we're looking here is, we get a bit of a good start. About middle of 2016, we're doing really well. In fact, we get to about a week between an incident for any vehicle. There was a bit of a blip, but we primarily get back to where we were until we get to the end of 2021, where now it's essentially about a day between incidents for any vehicle. Something big happened here at the end of 2020 with these vehicles with this software system, if you will. Again, you can see here with the intensity, you can almost do one curve and we get down to about six or seven incidents per month. Whereas here it's almost 30, essentially, once a day. We've been able to look into here and discover what's going on, at least at the aggregate level. Before we get to the vehicle level, I'm going to run one more graph that's looking at, we've got all these autonomous miles. Could it be that if I drive it more often, maybe I encounter more incidents? Could that have an effect? T here's a quick way to assess that. Just using a simple graph. We'll just plot autonomous miles versus the total disengagements. We see here for are a few number of disengagements, that might be true. The more you drive, the more you might see. But in general, long term, not really. There's really no big, strong correlation between how many autonomous miles driven, how many engagements you see. There's something else going on. T hat's actually what we found in the paper that I mentioned earlier is that the mileage impact was very minimal. Now, let's zoom in to the individual vehicle. We're not going to have all the data, even though I actually do have it here. But we're not going to have complete data for all of the vehicles. Let me break it down. I have the month, I have the vehicle identification number. Notice some of these, it's only partial. I have here what I call a VIN series. This is very empirical. I'm just taking the first four digits of the VIN. You'll see here, I'm going to scroll down a bit and we'll see. Let's see, maybe I will drag down a little bit. There we go. Some of these VINs, a lot of them actually start with the same four digits, 2C4R. I'll call them the 2C4 series. There's a bunch of vehicles that have this as their starting one. This identifies a particular fleet of vehicles, at least from an empirical view. If I scroll down, we're going to run into a different series, which I'm going to call the SADH series. This is the one that was introduced about 2021. That's when I saw the Venn numbers change to the SADH designation. Again, I have how many of miles, I have a starting month, when did that vehicle start? I'm going to use this to compute the time to events. First, I'm going to do a plot. I think this is the most informative plot you'll see for this analysis. What I've done here is I've essentially created for you a heat map. You can see I've got the heat map option select. Selected, I got for each vehicle and over time, essentially a cell, and that's just going to indicate, was I driven in autonomous mode anytime that month? I got it color coded by the series. These vertical lines correspond to the transitions between those empirical phases I mentioned earlier. What this is telling us is basically, can we identify what might have caused those transitions? Here we see an initial series of vehicles, and it looks like there wasn't a big change in what vehicles were introduced here. Maybe there was a bit of a software upgrade for this particular series that may have introduced those new incidents. Here we see that a new series was introduced, a smaller number of vehicles, maybe pilot series. Then a bunch of them introduced about that same period where we saw the other transition. Here, this seems to correspond to a new fleet of vehicles with maybe a slightly updated version of the software. Here, we see a clear distinction. Obviously, in 2021, a completely new series of vehicles was introduced. We have a bit of the old vehicles still there in the mixture, but most of them are the new vehicles. That probably explains why we got a new batch of new incidents. We got a burn in period for this new series of vehicles. This is cool because now we have a bit more explanation as to what was going on with the aggregate data, which is why it's important to have this information. Now let's break it down by VIN. I have right here script to indicate we've got a table here and it's similar to the table I have previously. Notice some of these have been excluded and this is because if for that particular vehicle, the total number of incidents was less than three, the platform is not going to be able to fit a model for you because it needs at least three incidents per vehicle. That makes sense. I have only one or two, that's not really enough information to assess the reliability. If I have three or more, now we're talking, I can do something. I also have the month since the start. I have some cumulative information there which month it started. I'm going to go ahead then and run the platform. Don't worry, I will show you how I did this. I'm going to go to the redo, relaunch. I'm going to get rid of that. That's some leftover stuff. I'm looking at one of these two last platforms. These are about multiple systems. Now we're thinking of each vehicle as its own system. The concurrent just means I'm going to run each vehicle one after the other. That's not what's happening here. The vehicles are essentially being run in parallel. Multiple vehicles are being driven at a time. Here I need a column to indicate the time to event, in this case, how many months since the start until this many events happened. I have the system ID. The one thing that's not shown here is I actually took the VIN series and used that as a by variable, which is why we have the where VIN equals so and so. There's only going to be two because the one with a little asterisks has no information about incidents. That's because that was two earlier that was prior to being able to tie the VIN to the vehicle. But they were there for completeness, I'm cancelled out of that. Now, what we're going to see here is a list of models that you could run. These first four, I'm not going to be able to do because I only have one phase. Essentially, the phase now corresponds to the VIN series. But there are two that I could run and the only difference is, do I want to run a model for all of them saying these are all part of an identical system? Makes sense. These are all vehicles, they probably run the same software. Maybe I can run a model for all of them. Or I can have a model where I fit it to each individual vehicle. Before I run those models, let's take a look at this plot. Again, start with the visualization and what we see plotted is all the vehicles. Notice here there's a bit of a shallow slope to this. Essentially there's a bit of a steep curve, but then it levels off pretty quick. This is a pretty good sign of reliability going on here. I'm going to compare them, I'm going to scroll down to the next one, the set series. Now, initially the axes here, just so you know, when you run it next time, the axes are going to only go to the complete set of data. This would actually be a smaller range. I fix them so they had the same range. You can clearly see that this is much steeper than this. Clearly, we have more incidents happening with this new series than this one. But we can do a quick model fit. I'm going to do the identical system. Again, it's a non-homogeneous poisson process. Although in this case, I'm going to ignore the estimates for right now, if you want to look at that, you can. I'm going to go straight to the mean time between failure and you'll notice that for all the months it's pretty much flat. What it's essentially done is this is just a poisson process. The rate is constant, which is good for modeling, not so good in terms of assessing it. It's just saying across this whole time for this particular series, we pretty much reached for any one vehicle, a mean time of about five months between incidents. Now, let's compare that to what we saw with the aggregate where it was about a week, that's across any vehicle. It's just saying for any vehicle, it was about a week between an incident for any vehicle. Whereas this seems to be implying it's about five months for any one vehicle. You can think of it as the running in parallel, so you can see it's staggered. Any one vehicle, it could be about five months. Again, this is an average, there's a lot of range between there. For one vehicle, it's a pretty long time. But in aggregate, they're probably staggered enough that it seems like it's about a week for any one vehicle. They can be consistent like that. But this is still pretty good. That's about five months between an incident, a disengagement event here for that series. If we run to the SADH series and do the same model, let me go here. There we see clearly increasing. I'm going to hide that. If we look at this, it says, "No," early on we probably had about two months. That's a bit of a start there. But we've dropped to less than a month, almost two to three weeks between incidents. Clearly, there's a bit more work to do on this series. Again, it was just introduced, so this is probably more of the burn in phase. If we get the 2022 data, we might start to see it level off like it did in the previous series. This actually might flip and be more of a level curve. That's about all I want to show for these platforms. I can show you some of the individual distinct systems, but there are a lot of systems here and so it's going to get crowded very quickly. There's a plot for each one. There's estimates for each one. You can look at the mean time between failure for each one. If there are particular vehicles you wanted to call out and see how they might differ, this is what you can do. You can see some increase, some decrease, but overall more or less flat. You can also look at intensity plots. If you find that more interpretable than the mean time between failure, you have other metrics that you can incorporate here. Okay, t hat's all I want to show for this platform. Now, of course, there's data I didn't include here. For example, we could break it down by cause. For some of this data, cause might be, I just need to take over because it was getting too close to the side of the the road. Or maybe the car stopped at the stop sign, did what it was supposed to, started rolling, and some other driver blew through the stop sign coming the other way. In which case, maybe that might not necessarily be a reliability hit. The car did what it was supposed to. Somebody else wasn't. It'd be interesting to break it down by that, also by location. The number of incidents, you get more when you're in the city, maybe on the highway, something like that. Real quick, we should look at the mileage impact. Again, same information, one or two incidents, sure, that might change. But overall it's going to be flat. The mileage impact on the incident rate is minimal. Of course, this is just one of many platforms available in the reliability suite. You can see there's a ton of options, very flexible for helping assess reliability. Again, that's all I have to show you. Hopefully, I've been able to demonstrate for you how well JMP can help initiate discovery and analysis. Hopefully, you discovered a lot of things about this particular company's autonomous vehicles. I hope you enjoy the rest of the conference. Thank you.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

JMP 17 introduces the Easy DOE platform, providing both flexible and guided modes to users, aiding their design choices. In addition, Easy DOE allows for the DOE workflow from design through data collection and modeling. This presentation offers a preview of the new Easy DOE platform, including insights from a 7-year-old using the new platform on a DOE problem of her choosing. Hello . I'm Ryan Lekivetz , Manager of the DOE and Reliability team at JMP . And I'm all Rory Lekivetz. We're here today to talk to you about Easy DOE . The question is it easy enough for a seven- year- old? N ow that you're eight years older you're a lot wiser to answer that question. For those of you who don't know about Easy DOE, so it's a new platform and JMP 17 . Now , the idea with Easy DOE is it's going to be a new file type that encompasses the design through the analysis of a designed experiment . No more do you need to worry about splitting up , going from the DOE platform to a data table and then running the analysis separately . Now , the idea with Easy DOE is that we're trying to aid novice users through that entire workflow . There's going to be a guided mode where we've tried to add hints and useful defaults to guide those users while at the same time having a flexible mode for those who are more comfortable with Easy DOE . Now before we started doing this idea with Easy DOE and running our experiment, I did talk to Rory about the daily workflow . I f you open up the DOE documentation , we outline this idea of a Easy DOE workflow which goes through the described phase , which is where we identify the goal and the responses and the different factors, specify where we're looking at our model . We create the design , collect the data , fit a model to that data , and then use that model to predict . Right now , if you think about the way traditionally we've done this in JMP, at that design phase is where we create the data table . Using that data table the experimental go and collect the data and then perform the remaining steps ending it. N ow what you'll see in Easy DOE , there's the tabbed interface where each tab represents one of these steps in the DOE workflow . Now what was the experiment that we did ? Paper airplanes . Rory had found a website that talked about different ways to create paper airplanes . You want to tell them what was the response ? What were you trying to measure ? We were trying to measure the distance which was inches . What factors did you end up deciding that we could change ? For factors we decided on war plane type, paper type, flying force and paperclip . Yeah . Now you to tell them about some of these different tabs . Okay , so let's start . What was the define tab ? The define tab was where you got to choose your factors and your responses . That's right . I should mention here as well that when we were using Easy DOE , I left Lory in control of the entire platform . She launched it . She was the one entering everything and clicking between tabs and all of that . I think after the define tab , we moved to the next. What was that next tab? Model. F or the model tab , you had to choose w hich one of these four was the best for your experiment. Now I'll say too, on this one this is where we had to talk a little bit more about what these different model types mean . Of course , for a seven- year- old and even an eight- year- old , now that idea of understanding interactions can be a difficult thing . Now , the main effect versus the interaction . One of the nice things was the website that we had found about creating paper airplanes . It talked about how some of the different types of paper airplanes do better when you throw it hard versus light . It already had discussed that idea of interactions, so that's why ultimately, I helped her decide on picking that two- factor interaction model with the main effects . Once we had that model, then what happened? Then was the design . The design shows you what you're going to be making . Since we were doing paper airplanes and we entered the factors for tight paper throwing for some paper clip , then it sounds like different types , different papers, different throwing forces like route and paper clip or no paper clip. Yes , I think we made the 16 different paper airplanes and so each one was a different one . I think we put a number on it . Is that right ? We label it with a number one . Yeah . Then what happens after we have that design, what do we do with that ? Then we do good data entry . With data entry is where you enter in how many inches you want. Yeah . I think we went outside and we took those paper airplanes and we flew them anyway and then just measured that . Yeah , that's right . W hat happened after we had that data entry ? Then we go to be analyzed . Analyze is where you figure out which ones were the best. Yeah , which ones really were impacting that distance flown . Now , I should mention here , so this is a novel thing in Easy DOE . The confidence intervals for each of our different effects are clickable . All right, I had actually thought that this was going to be a really difficult time to talk about the analyzed . But as soon as we got there, so it starts with all of the terms in the model and very quickly figured out that you could click on them . She looked at the ones that were close to zero and just removed them . On the top was actually her model . She actually picked a much simpler model than even what the best model . You'll notice there is a best model one . But one could argue that I actually might even prefer her model to the one that was picked by the best model . But again , still a very nice way to play around with your model and see what happens if term enter o r are removed just by clicking on those confidence intervals . T hen after we moved to the analysis , what was that tab they had? The predict tab was where you could see which types or which t hings were the best . The best look like it would probably be a dirt metal construction and differently for us since it was like, the hard in blue light was like… Did it matter or not really? Not really . It would be like you could do hard or light in your paper . I should mention here, so it was interesting to see she hadn't really seen the prediction profiler so much before . I mean , wasn't familiar with it . S he did have to be told to click in there to see what happens . But even for a seven- year- old , it's interesting to see once they have that sense that they can click within that prediction profiler , she was really able to get the hang of it . J ust some final thoughts from Easy DOE . I asked Rory a few questions ahead of time . What would you like to tell people about Easy DOE ? It was really fun . Yeah . If you were to do this experiment again , would you change or what would you change ? The factors , maybe different days for the weather . You think like it might be windy on some days and not on others . F or my own perspective , you know , so she was actually able to complete this with minimal help from me . I mean , she was in control the entire time of the Easy DOE platform . A lot of these different choices she was making on her own . Even when it came to the factors that she picked . As well she actually did help us find some usability issues . There were pieces like in the design tab that I think we improve throughout because of users trying this out not just for her , but as well as other users that we had in the DOE program . The model she definitely needed help with , but the analysis was easier than expected . Just some references , acknowledgments . Really , I just want to thank all the members of JMP that helped in the development of Easy DOE . There's a huge list that you'll actually see . We have a Discovery America presentation there as well , where we talk about this in a little bit more detail . Again , all the feedback from external and internal users that have seen this before the release of 17 and since it's been released . Thank you for your time and joining us today . We hope you'll join us during Discovery where we can discuss this poster . Not sure yet if you'll be able to join us , but I definitely will be and hopefully you as well . Thank you . Thank you .

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

This presentation demonstrates how to access different production plant data to look into all corners of a production plant for quality assurance. JMP was used to access production, process, and quality data with one click. The presentation shows how data accessed via REST-API from an existing application was optimized by data import, data cleanup of input tables using JSL, and visualization of automated dashboards. Specification and control limit management for automated reporting was critical. Also, where data import possibilities were limited, a combination of Python and JMP was used to import process factors and responses for each production step of interest in mass production plants. A full range of quality assurance tools will be demonstrated, which could be used to begin helpful discussions with production teams for continuous improvement plans and PDCA Cycles. Hello, everybody. My name is Ole Lühn and I am member of Global Quality Assurance at BASF. Today at the JMP Discovery Summit in Europe 2023 in Sitges in Spain, I want to talk about my topic, which is called Don't Lose Your Time Anymore, Automatic Assess, Visualisation and Evaluation of Product Environmental. So how to get fast access to your production sites. And here in the background, we see the site I'm working at, which is BASF Schwarz heide in Germany. On this picture and on this slide, you see all the sites of BASF worldwide. So there are Verbund sites, which is a combination of different production environments. We have R&D centers, production sites, and regional centers. And for Europe, I am located in Schwartzheide. And on the next slide, I will show you a little bit more detail where Schwartzheide is located. I am working, in fact, with different sites closely together, which are Ludwig shafen, which is the main part or the main site of our company with approximately 39,000 employees. And Schwartzheide is located roughly 140 kilometre south of Berlin, and we have about 2,000 employees. Today at the JMP Discovery Summit in Sitges, which is located a little bit north of Tarragona and south of Barcelona, we are having the Discovery Summit. I explicitly showed Tarragona here on the slide because I have also worked with them to do where we formulate our end -use product, which we sell to the market. And on the upper right corner, you see a glance of Europe side from BASF, and for the color code, please see the previous slide. I am working in Schwartzheide, and here you can see the picture of the site as of today. So the plant where I am working at and where I am working for is in the orange rectangular shown here on this picture, and on the lower left, this picture shows the view from my former office. I had a look out of the window and could see the plant where I was working for. So at that time, I was still located in a different office. Of course, I want to know more about what is happening in the plant, and that is what my talk is about, to have access to data in the plant when you're not currently working there. So the introduction of BASF and me, I am a member in the business unit, Agricultural Solutions, in the team of the Global Quality Assurance. And I am a JMP user since roughly JMP 9, so it must be something 2010. My task is to assure production processes and times of quality, so quality assurance. Also, I am working in quality management and I am a ud itor for different ISO norms, and I am involved in nonconformance management and deviation management. For the final product, I'm responsible for the release, and I also have to sign each COA that we pack and that we pack to the product shippings around the world, and on each COA, I have to sign. So I want to know what is happening in the production. So from a statistical point of view, I want to know if there are differences in the production or in the production process and the production environment. From a practical point of view, I need to evaluate if these differences do really matter. I want to know the details of the production as fast as possible and as easily available as possible, and that's why I started this topic a while ago. My goal is to have a proper root cause analysis, preventive and corrective actions in nonconformance management. The work started roughly 2020 when we all realized that the corona pandemic was developing. And like in many companies, there was a restrictive access to the plant. So while I was sitting at home, I was asking myself how to get access to the data from the plant without being on site. So how can I see what happens in the plant when I'm not really sitting in the plant? And how can I be part of the production and the production teams when I'm not available around? So the idea is that I go digital to the plant and realize what happens. Without the data access control V and control V, which takes everybody notices in our community days, maybe, or hours to get a real evaluation done. So my idea came up while I was participating on a seminar on the tablet production process in the JMP lecture courses, and my goal was to have the highest degree of automation in my data access in seconds. Here is what it is all about. So here on this picture, you see a general overview of our plants. In fact, we have three plants which are producing parallel our product. And the very general value stream is shown on the top of this graph. We start with raw materials, we have intermediates in our process, and we go to the final product. So this data of the last three processes is available via a REST-API interface where I can get the data from. All the data is prepared in one table. I have the time, I have my final lot, I have the QC data of my final product, and for my intermediates, I have different factors and different responses available. What I mean with factors and responses is shown on the upper right picture, which I took from the JMPs course. This work was started with a colleague from me, Bernd Heinen, who helped me out in writing the scripts and preparing my data access. This is what I want to show you now in JMP. We prepared our scripts in a very easy way. We put the REST-API address here in our scripts. We do some scripting instructions. I had a problem with the data time format when this table was imported into JMP. I was asking in the community and someone gave me the solution for my time formatting problem. I add some specs to my columns and I save it on a drive. I always save it as an Excel file as well. The data which is shown here is the data for one of my plants. I opened this table now. It's table number two. Here I have my time of different processes in the plant. I have the lot of my final product. I have the response, the QC data of my final product, responses and factors of previous processes. Firstly, I'm interested in the final product. I started to plot the data in the control chart of one of my responses for my final product. However, it is an X -bar and R -chart. The reason for this is that from my last intermediate, five lots are put together in the final reactor, and therefore, the X -bar and R -chart is created, and not the individual and moving range chart, which I am interested in. So I need to do a little bit of data preparation work. The outcome of the data preparation work is shown on the lower part of the slide where I have the access of all three plants parallel, I can do my data preparation. I use the function in JMP to concatenate these data tables and I can plot easily an overview of my final QC data in this video. I use the Fit Y -by -X platform for the three plans, use the column switcher and recorded a video. Everything is possible within seconds. The idea of my work started when I had a look at the tablet production processes or the tablet production process from the JMP lecture. Here we have this red triangle which shows us suspicious values in a production trend. My idea was to also create in my data table a column where I add suspicious data or suspicious results and color it in a binary format. I changed my continuous variable into a binary one, like good or bad. For me, it is more easy to discuss this with the team from statistical point of view and everybody in our community knows this as well. There are more powerful ways in order to get this data analyzed for regression, for example. However, I choose this way. I will show you what I mean. I prepared this in my slides. I wanted to have these indicators, good or bad, or suspicious or not suspicious in my data table, so this data table is taken from the JMP library, the tablet production process. And in the end, I had my data prepared like this. I added a column to my data table. Here, because I had five lots of my intermediates, I needed to split my data first and to evaluate only the final product. I added a column where I evaluated on my individual and moving range charts how is the data distributed? And here we can see that my internal specification is far above my data. However, there are suspicious events in the production process. And this data I can also plot in a dashboard here where I cut off above the upper control limit my data. With this data prepared, I can have easy discussions with my team and tell them, "Look, here, what happened here? I just added time and say roughly around this time, we had a suspicious event here in our factory. What happened?" Let's go back to the JMP table or to the presentation. I created this column, good or bad. And from a statistical point of view, we know that regression is better. The script in order to evaluate all this data at once within 20 seconds of time is containing actions like to split the data, make a summary, analyze the control limit, summarize my control limit in an extra table, introduce this column. First I did it manually, but I now had a nice script to do it automatically. I merged my data, my summary back into my first data table, which I extracted from my REST-API interface here, and it's done. Everything was in seconds, and from the JMP team, Florian helped me to do this here. Here is just the proposal for me and for people who work in the quality department, how to discuss these graphs with the team. Of course, there are different ways how to analyze data in JMP. I like most the Fit Y -by -X platform, and I also like the hypothesis testing ways which are shown on the video here on slide seven, where I plot the three plants parallel to each other and go through all my QC data of interest and compare it. Furthermore, I also like the process performance graph which is available in JMP. But one way of analyzing my data is also that I can easily plot Pareto plots of my out -of -control events. I have the data prepared. We go to the table. We have only the values which are not okay or above the control limit selected. I click on Pareto Plots and I can discuss this easily with the team and tell them these two variables are most of the time out-of-control. What do we do? I can also compare three plants at once immediately. It's more or less the same, but I can have a look here and create meetings and discuss about suspiciencies between the three factories. So obviously the factory which is colored red, there the less out-of-control events are created. This is part of my story to have fast data available here to to be discussed in the teams. Of course, I know that there are different options in JMP, so we can do regression, we can do predictor screenings, and the data table for the predictor screening is also shown here. In fact, I have my final data table which I extracted from the REST -API interface, and I have this out-of-control column added to my previous data, and I can do a predictor screening and I can have a look. Where are the factors in the plant which influence the outcome in my final product first? Of course. Okay, he's doing the calculation at the moment. Of course, these kind of things are difficult to discuss, and it's also not part of my job to discuss here about process suspiciencies. My topic is quality and I want to reduce the number of off -spec events in the plant. I'm talking about off -spec and not out-of-control anymore. We had a time... I go to the right part of the slide. We had a time in the plant where we created off-spec events. I did a partition with the data and the data told me, "Look here, this is suspicious, below 12,000 or 13,000. Below this value, it is most probably that we create off-spec events." Here in the graph builder, these events are marked red. Obviously, this is important. It is a summary of chemicals that we put on a filter unit in the plant. I will show you the video soon. The second step is that in the partition, we came to the conclusion that above a certain solven t dosing, it is most probable or it is likely that we produce off-spec lots. There are two results from this evaluation. Don't put too much and be careful with the solven t dosing. Now I want to show you the video. During the this time, we sent a colleague to the plant and he took videos out of this reactor. The red rectangular is actually more or less the amount of what we put here on the filter unit. The green rectangular is the amount of solvent, which is dosed afterwards. If it's too few put on the filter, and too much solvent, obviously we created here a problem. After we discussed this with the team, it was about one -and -a -half years ago, we did not have these events anymore in the factory or in the plant. So for me, it was a quite successful story how we came here to the conclusion that we can be more careful in our plant about our processes. My primary result of this first part of my talk is that via the scripts and the way I imported data to JMP via the REST-API interface, it is possible to sharpen the awareness in the production with respect to deviations. Of course, there are more powerful regression possibilities in the tool which we are using, but it is not my job as a quality manager. The goal is to hand over here a powerful tool which we can use in our factory and plant for deviation management and root cause analysis. My idea is if someone is not wanting to use our tool JMP or wants to use a different tool, I can save always the scripts or the files as Excel tables. However, only the last three processes are available here, and it is limited because this is a project which was started 2017 to prepare all this data correlation and time between these three processes. However, what do I do if I'm interested in previous processes like the raw material or my first intermediate? How can I get this data in to jump from the outside of the plant? Therefore, I started to use and to investigate the interface and the possibilities to import data via a connection between Python and Azure. It started all when we had the manufacturing execution system changed by the end of 2020. The way I tried to go was stopped because this manufacturing execution system was stopped. I thought, "Was everything for nothing now?" No. I found a paper which was written about the synergies from JMP and JMP Pro with Python. The concept is shown here on the left side of the slide. In fact, I have a JMP script, I have a Python part, and I have an option in this Azure where I do my data access. Everything is also possible within seconds to get the data out of the plant. All is written here in the scripting guide, page 786. These are only a few pages. If questions come up, Emanuel from the JMP team helped me here to do this. Here's the concept. Let me move this Zoom picture. On the left part, it's what everybody in the chemical industry knows. Raw materials are delivered in trucks. They are stored in a tank farm and the production has to use raw material, of course. And raw material is consumed during the production processes and new raw materials are delivered and the tanks are refilled again. So we have a certain level and everything is fine. It starts and the way to investigate here and to get the data in to JMP is we have to look up our individual process of interest. Every point of interest in the plant has a certain number, and this number needs to be fined. So you can ask colleagues, you can ask SAP, you can ask the automation team, you can have a look by yourself. You need to find a way in your company who can help you. When you found the number, you have to make the request in Azure. This number I was talking about here, it's named at us, it's the unique ID. I type in here how many days back I want to see the data. I test it in Azure. He creates a table and I put this code into my JMP script and I click on execute. This is what I show you now. It's also done in few seconds only here. This first part you need to get from colleagues or you can look up in the internet. I looked up in the internet and this is not part of JMP support. Here you can see that within seconds I can have a look at the filling level of some of our raw material tanks or the tank farm. This is the concept that any process of interest can be imported like this into JMP within seconds. I wrote a manual or a journal how this can be done, and I will publish it also with my work in the community. If you don't know how to do it, it's written here. Where you can find it in the scripting index, I have a paper here how you test your system. I have a few recommendations for the help in the scripting index, and also I found some answers which I placed before in the community, so I hope to have also feedback from me documented there, and I showed you how to extract data from anywhere in the FAB or in the plant. And this brings me to the end of my talk. The summary is that product deviations happen to all of us. So here I show you some quality assurance tools, and via one click, I have all my necessary graphs and information prepared like this to discuss with the team. My message is that don't be afraid from using the JSL scripting language that I was before. When you start, you learn fast and you get the job done. You can prepare good discussions with your production teams and you can start continuous improvement plans and PDCAs, and everything can be available within a few seconds. I also have some ideas how to go on here, and there I need the help from you, so from the colleagues from which are attending the conference. Maybe you can help me automate and schedule my evaluations. I know that this can be done in JMP Live in version 17. If you don't have it, I need to use the Windows Scheduler. For example, I need to improve a little bit by creating add -ins, and the data cleaning still can be a little bit bit more better and improved, but okay. My main goal was to make the members talk more about suspicious events in our factory and out-of-control events, which is the lower right part of the picture of the slide, and less about off-spec events and out-of-control events. From quality management point of view, I really am a fan of turtle diagrams and turtle tools to document your improvements. This brings me to the end of my talk. I hope you enjoyed it as well as me when I was preparing this work, and I'm looking forward to your questions and to see you in Spain. Thank you.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

With version 17, JMP Clinical is now a fully JSL implemented product. This presentation will demonstrate the reimagined JMP Clinical and how it uses new JMP 17 features. Three new features in Tabulate format the tables produced by clinical reports to be publication-ready. Pack combines counts and percentages (or other statistics) into one column, while stack allows multiple grouping variables to be combined into one column. Tables displaying event counts also take advantage of the new Unique ID feature in Tabulate to count events only once per subject identifier. With these three new features, tables can be copied and pasted into any publication or report. JMP Clinical’s risk reports also use JMP’s new Response Screening platform to identify safety signals by calculating risk difference, relative risk, and odds ratio faster than previous versions. With all these new JMP features, JMP Clinical produces publication-ready reports quickly and effectively. Hi. Thank you for joining me today. My name is Rebecca Lyzinski. I'm a senior software developer for JMP Statistical discovery. Today I'll be talking about how JMP Clinical uses some of the new JMP 17 features, such as Pack, Stack, and Response Screening. First I'll talk a little bit about what JMP Clinical is. Then I'll go into what changes have occurred in JMP Clinical 17 compared to previous versions, and then show a demo of JMP Clinical and how it uses the new tabulate features of Stack, Pack and unique IDs, as well as the new Response Screening platform. First, what is JMP C linical? JMP Clinical is a JMP product that is used to analyze clinical trial data. It works by using the standard formats of CDISC, SDTM and Atom data. Once the data is loaded, JMP Clinical runs interactive reports for events, findings, interventions and more. JMP Clinical is used by a variety of fields, including medical doctors, medical writers, clinical operations, and statisticians. In addition, JMP Clinical works with JMP Live to share reports across your organization. With JMP Clinical 17, there's a big change, in that JMP Clinical no longer uses SAS as the basis for the code underlying the reports. Starting with JMP Clinical 17, it is now completely built off of JMP. This means that we have a faster installation because the installer is now more compact than it was before. JMP Clinical 17 also has all of its reports redesigned using JSL as the underlying code system for the reports. Another change is that now the reports will auto run. There's no longer a need to click a button in order to get the report to run. JMP Clinical 17 will also include some new reports, including the FDA Medical Queries and the Algorithmic FDA Medical Queries? One additional change is that now all the study preferences are in one location. You only have to go to one place to change a preference, and it will take effect across all of your reports. Now I'm going to switch over to JMP Clinical for a quick demo. When you first open JMP Clinical, a main window will appear with three different tabs one for Studies, one for Reviews, and one for Settings. The Studies tab is where all your study data is located. Here you'll see that I have the study, the Nicardipine loaded. You'll see paths for the SDTM and Atom locations of your data, as well as which domains from those folders have been loaded for the study. This is also where you can add a new study. You can refresh the study metadata for an existing study. If you add data to it, or you add variables, or you change variable names, you can refresh the metadata and all those changes will take effect. You can also set study preferences or set the value order in color for a given study f rom this tab. Set study preferences is new in JMP Clinical 17. It will open a new dialog. Here you can change any of these widgets and the new values will take effect across all of your reports. F or example, if you didn't want your reports to run off the safety population and you wanted them to run on all subjects instead, you can change to all subjects here. Once you click Okay, all your reports will now run off of all subjects instead of the safety population. The next tab is for Reviews. Here, when you click Start New Review, the Review Builder will open and you'll be able to select which reports you want to see. For this example, I'm going to look at the demographics, distribution AE D istribution, AE Risk Report and the two FDA medical query reports. If you wanted to add additional reports, you can click on Add Report. A new window will open up with all the possible reports you can run on this study, and you can make additional selections. Demographics distribution is usually a good place to start in any clinical trial. Here there are tables and distributions for each demographic characteristics such as sex, race and age. Tabulate is used to create the tables at the top, and you can see here that the Counts and Percents are combined into one column using Tabulate's new feature of Packed Columns. Underneath is a distribution for each of the demographic characteristics. On the side, there's an option to add additional distributions if there are other characteristics you would like to see. By clicking the Add button, you can add any variable from either the ADSL or DM data set, and it will show up under Distributions. There's also an option to perform treatment comparison analyses. When this button is clicked, the report will automatically rerun. Now at the bottom of the report, there's a one way analysis for age and a contingency analysis for sex and race. This allows for comparisons between treatment groups to be done to see if there are any differences between the treatment groups. Typically, an important safety analysis that occurs in any clinical trial is to analyze the adverse events that occur throughout the trial. In Adverse events distribution, there's a graph and a table showing the distribution of adverse events across treatment groups. At the top is a bar chart with the count of adverse events split out by NiNicardipine and Placebo, the two different treatment groups for the NiNicardipine study, they're shown in descending order for each treatment group. Under the graph is a tabulate. Here, you'll see that the first column is body system organ, class and dictionary drive term. These are two different measure terms that are used to classify adverse events, and they're being stacked on top of each other in the tabulate. In the other columns are Counts and Percent split out by the planned treatment group, as well as a total count and Percent. This table uses a lot of the new JMP 17 features for Tabulate. The first one is the Stack Grouping Columns. Here you can see if you right- click on the Column, the Stack Grouping Columns option is checked. If we were to uncheck it, it gets split back out into two separate columns. This is how Tabulate works for JMP Clinical 8.1 in previous versions where we had to have two separate columns for the two different variables. Now, by selecting both columns and right clicking and going to Stack Grouping Columns, we can combine them back into one column. This allows the table to now be publication ready for any PowerPoint or journal article that it might want to be used in. Somewhat similarly, we have the Count and Percent in one column which did not exist before. If you right- click on one of these columns, you'll see the new Pack Columns option. If we unpack the columns, they're now separate into two columns, one for the Count and one for the Percent. By selecting both columns and right- clicking and going to Pack Columns, we can now pack them back into one column so that the Count and Percent show up together. The other option that this table uses is if you open up the control panel from the red triangle, you'll see that there's an ID variable that's been added that didn't exist before. Here you'll see that unique subject Identifier has been entered as the ID variable to use in this table. What that option does is it counts each subject only once on each row of the table. For example, if the subject had both a vasoconstriction event and a hypertension event, they would only get counted once with in vascular disorders. Previously, before the ID variable existed, this Vascular disorders row would have been a sum of all of the events that happened underneath it, which may overestimate the number of subjects that had a vascular disorder event. You can also see at the bottom of the table that this option now adds a row called all. What this represents is the number of subjects with any adverse event. That's another nice additional feature added through the ID variable. With these three changes, we now have a very nice publication ready table to print out to whatever word document PowerPoint you want to include it in. A couple of other features to mention on this report before moving on to the next one is that there are some options listed under Data. For example, if you wanted to look at a different measure term than the ones that are automatically presented, you can change them here to report a term, Highlevel Term, etc . You can also change the report to run on pretreatment events, treatment events, on- treatment or off- treatment events. The Demographic Grouping Widget will change out the variable on the y axis of the graph builder, as well as change the variable used in the Tabulate to whichever variable is selected from demographic grouping. There's also an option to Stack both the table and the graph. For example, if you wanted to see the adverse events split out by severity, we can select severity. Now the bar chart is stacked by mild, moderate and severe events. The table is also split out into columns for mild, moderate and severe. The report also uses a local data filter in order to filter both bar chart and the tabulate. You can filter on things such as whether or not the event is serious, whether or not the event is related to the study treatment. We can also filter on a correct overall percent occurrence of the adverse events. For example, if we only wanted to see adverse events that occur in 5% or more of the population, we can change this filter. Now the bar chart and the table are both filtered down to only subjects, only adverse events that have at least a 5% occurrence in the population. Another way to analyze adverse events is through the Risk Report. This risk report uses the new JMP 17 Response Screening Platform to create both a Risk Plat and a Tabulate. The Risk Plat shows the percent occurrence of subjects within both treatment groups, so Placebo and a Nicardipine, and it also shows the risk difference in comparing the Nicardipine to Placebo along with a 95% confidence interval. The table repeats this information just in tabular form with columns for the Counts and Percent in each treatment group, as well as a column for the risk difference in the 95% confidence interval. The Response Screening platform works off a table that looks like this one, where we have unique subject identifier as the first column, and then there's a column for each adverse event. That's an indicator column with zero representing no event and one representing an event. If we pop out this table. The Response Screening platform is located under Analyze Screening. Response Screening. It will open up a new dialog where you can select your variables that you want to compare. Because there are 202 different adverse event columns , we've combined them into a group of columns and this allows you to just select one variable and it will automatically put all 202 columns into the Y Response Column using Plan Treatment for our X and click Okay. Response Screening then brings up this window. The default view is to look at the FDRP values and a table of those values. JMP Clinical uses the two- by- M results table. This is where Response Screening calculates the relative risk, risk difference and odds ratio. JMP Clinical works by creating making this table into a data table and then using Graph Builder and Tabulate to format it in the view that was shown in the report. In order to get the additional columns needed, if you right-click on the table and go to Columns, you can select the different 95% confidence interval variables as well as a total count and the different counts for the positive versus negative comparisons. Once that Response Screening is run, then it's created into a data table in this bar chart and the tabulate are created. The tabulate again uses the Pack columns option to put Counts and Percent into one column, but it also uses it to put the risk difference in 95% confidence interval into one column. If we were to unpack this group of columns, you would see that it originally started as three different columns. Even with three different columns, we can still pack them together into one column. If you didn't like the format of the way that they automatically packed together, you can right- click on the column, go to Pack Columns and Template. Here you can change the format of how the column appears. For example, if you wanted to see brackets instead of parentheses, you could change them here. You could also change how the columns are delimited. The default is a comma, but you could use a semicolon or any other character that you wanted to separate out your columns. Similar to the AE Distribution Report, this report has a few different options. Some that are different are that you can change the risk measurement, so you can look at either risk difference, relative risk, or odds ratio. You can also display the risk difference as either a percent or a proportion, and you can sort the plot in the tables by risk measurement count or alphabetically. This report again uses a local data filter to filter both the plot and the table by either a dictionary drive term, the risk difference, or the absolute risk difference. Here you can see that I filtered the risk difference down to two or greater so that we can see the Plot and Table a little more clearly. Another view of the Risk Plot and the Response Screening output is the FDA Medical Query Risk Report. This starts out as just being called Medical Query Risk Report, and then there's an option to analyze it either by FDA Medical Queries or standardized medical queries. Medical Queries are a way to group adverse events into different medical conditions, and these are the two different standards. Standardized Medical Queries are created by MedDRA and usually come as SD files. In September of 2022, the FDA released their own Medical Queries as an Excel file that can be downloaded from the web. JMP Clinical handles both of these different standards and can be switched on this report back and forth by selecting either FDA Medical Queries or Standardized Medical Queries. Just like on the AE Risk Report, there is a risk plot with the percent occurrence for each treatment group and the risk difference between the Nicardipine and Placebo. The difference is that on this report, the Risk Plot is split out by scope. Either a broad medical query or a narrow medical query. Underneath some custom scripting is used to create tables that stack the medical queries by the preferred terms that contribute to them. Just like in the AE Risk Report, we have counts for columns for the Counts and Percents, as well as a column for the risk difference between Nicardipine and Placebo. Here you can see that, for example, for Arrhythmia, the dictionary derived terms that contribute to that medical query are Atrial, Flutter, Atrial f ibrillation, Arrhythmia, Bradycardia and a few others. Underneath that table is the same table just for broad medical queries split out by preferred terms, a table for medical queries split out by broad or narrow, depending on the scope, and a table for which medical queries are contained in each system organ class. For example, gastrointestinal disorders is made up of abdominal pain. The last report I'm going to show is a brand new one in JMP Clinical 17.1 inversions beyond that. Within the FDA medical query Excel file, there are some text boxes for different algorithms in a few different medical queries. The algorithms include criteria that's not just limited to adverse events. For example, in H yperglycemia, a subject could be categorized as having Hyperglycemia if they have an adverse event that falls into the Hypergysemia FMQ category. But they also could be classified as having Hyperglycemia if within the lab data set, they have more than two plasma glucose values over 180 milligrams per deciliter. This report uses the adverse event data set, the lab data set, and the continent medications data set to determine if subjects have a given medical query, rather than just looking at the adverse events and mapping them to a medical query. Similar to the other risk reports, this report uses a local data filter to allow you to filter on the medical queries the risk difference and the absolute risk difference. Again, we have the same options to switch between event type, the risk measurement for risk difference, relative risk, or odds ratio, and for sorting the table by risk measurement, count or alphabetically. That was a quick overview of some of the JMP Clinical features and how JMP Clinical uses the new JMP 17 features in Tabulate and Response Screening to make our reports. However, JMP Clinical is a much bigger product than just those five reports. We actually have over 30 interactive reports. Some commonly used ones that I didn't mention are the Adverse Event Narratives, The Patient Profiles, A Study Flow Diagram, like the figure below, that shows you how subjects progress throughout the study and the ability to analyze by high's law cases. JMP Clinical also works with JMP Live. At the top of each report there's a button that if you click it, it will publish and share the report across your organization. There are also future features coming in 17.1 and future versions, such as adding the ability for crossover support, for analyzing crossover studies. There'll be even more reports being added, such as a couple of oncology reports. Thank you so much for your time. I would appreciate any comments or feedback if you want to leave them or email me directly. Again, thank you for your time and hope you have a wonderful day.

0 attendees

0

Event has ended

0 attendees

0

Saturday, March 4, 2023

ABSTRACT Stress and lameness negatively affect the health, production, and welfare of animals. The following physiological and non-invasive measures of stress and lameness were measured: core body temperature, corticosterone (CORT) concentrations in serum and feathers, surface temperatures of the head (eye and beak) and legs (hock, shank, and foot) regions by infrared thermography (IRT), leg blood and oxygen saturation (leg O 2 ). JMP Pro 17 Model Screening platform was used to fit several parametric and machine learning models to the binary response variable (Lame=1) of the 256 study birds on the nine health and stress indicators mentioned above. We selected K Fold Cross-validation with K=5 and repeated the process twice (N Trials Folds=10). The best models were the Neural Boosted (Mean AUC=.985 and misclassification rate of zero in 50 validation birds) and the Generalized Regression Lasso (Mean AUC=.975 and misclassification rate of 3 in 50 validation birds). The Stepwise Logistic Regression and most interesting for explaining BCO required only seven of the nine indicators and had a similar overall fit performance as the other two. Both REG models “agreed on the significant predictor effects,” and when applying Model Comparison to compare them further found them nonsignificant to each other (comparing their AUC’s). Hello, everyone, and welcome to our presentation. My name is Dr Andy Mauromoustakos. I'm a professor at the Agricultural Statistics Lab at the University of Arkansas. My co- presenter is Dr Shawna Weimer . She's A ssistant Professor at Poultry Science Department at the University of Arkansas, and she's the Animal Welfare chairperson at the university. We're going to talk to you today about predictive models for BCO lameness using health and stress and leg health parameters in broilers. Our presentation is going to be short. We're going to discuss the models that we are fitting, the champion models, the ones that they get the medals. We're going to evaluate them, and we're going to have some conclusions in the end. Shawna? All right. This study compared physiological and non-invasive measures of stress and lameness in clinically healthy and BCO— or b acterial chondronecrosis with osteomyelitis—laden broilers. BCO is a leading cause of infectious lameness in broiler chickens, with flock diagnosis requiring euthanasia. Thus, there's a need for technological innovations to detect health and lameness status in animals to project the disease likelihood prior to clinical onset. In this study, birds were raised in separate environmental chambers with either wood shavings on the floor, or the litter, and a wire flooring model that is validated to induce BCO lameness. Nine non-invasive measures of stress and lameness were collected from 256 birds, male broilers, over several weeks, which included core body temperature, stress hormone, surface temperatures of the head, the eye, and the beak, and the legs, the hocks, the shank, and the feet, with infrared thermography. Leg blood oxygen saturation was also measured with a pulse oximeter. Of these measures, two we sought to validate. The first was extraction of corticosterone from the feathers. Corticosterone is the major primary stress hormone, and the gold standard for measures is blood serum concentrations, which requires the capture and restraint of the bird to collect it. So if feather corticosterone could be validated, then we could simply clip a feather and not put the bird through that stress of restraint and blood draw. The second was the thermal images, of which each pixel has its own temperature recorded and can be used to quantify external changes in skin temperature related to blood flow, offering a non-invasive tool to measure health and welfare. During stress, peripheral blood is shunted to the core, and we expected the average pixels of the eye and the beak, or Eavg and Bavg, to be lower in lame than sound birds, which we correlated with the serum corticosterone. For the thermal images of the legs, we expected the average pixel temperature of the hock, the shank, and the foot to be lower in the lame birds, both for the stress reasons and also for the colonization of the bacteria that slowed the blood flow, which we correlated with the leg blood oxygen saturation. There were marked differences between lame and sound birds. Our objectives is to identify which of these nine health and stress indicators are important for lameness. We want to build models, both for prediction purposes, but we are in agriculture, and we like to publish papers that try to explain how our inputs are affect the response. We are hoping that some of the models, the traditional regression models, will do fairly well and we can interpret those. In our methods, we talked about… It's a balanced study. It's a match paired experiment where every time a sound bird is observed, a lame bird is also observed in the same indicators. We have an incidence of a disease of 0.5. We're going to take advantage of the Model Screening platform of JMP Pro 17. We're going to select the lame that has values 1 and 0 categorical. 1 stands for, yes, it's lame. 0 is the sound . We have our nine predictors in here. We're going to do the defaults. We're going to fit all of the machine learning models, that they are checked. We're going to select cross validation. We're going to have 5-fold cross validation. With our approximate 250 birds, we're going to have about 50 birds per fold, and we're going to repeat it twice. We selected the random seed so we can reproduce the results. Notice that I did not add the quadratic terms and interaction terms to hopefully have easier interpretations. When we select this, and we click " Run ," JMP takes about a minute. I t's going to come up with the ranking of the models that we have fitted. This is what we call the beginning of the end. We have 10 different data sets that we have tried. This is the average fit criteria, higher the RS quare, and is the ranking the best, second best, third best. Hopefully for us, we expected that the Neural Boosted model may do a little bit better than the traditional regression models, such as the Penalized Regression and the Logistic Regression models, but we were happy to see that these are our close second. If we wanted to see how our best model did, we're going to go and see that the best model— the Neural Boosted, that is the best model for predicting purposes —had a misclassification of 0. You can see here both in the training and the validation that our receiver operating characteristic curve reaches very soon the 1 and stays at 1, which is extremely good. We see the confusion metrics that out of the 49 birds, we're not really misclassifying any of them, so this is a very good model with a very high generalized RS quare and all of the other fit criteria that is produced in here. But we are more interested in the regression type of models. The regression type of models, the Generalized Regression, we can see that when we use Lasso as the estimation method, the model included all of the variables, and we have couple of non -significant variables in the model in here. We can see these non -significant factors, such as the FCORT, you can see it that it's not crossing the horizontal line. You can see that the year average is not significant, but is included. We can see that this model had approximately three misclassification, a misclassification rate of about 7 %. Here is the third model that was... If we had to give it gold, silver, and bronze, these two will share the second place. The Logistic Regression model, when we did the step wise procedure, decided not to include the two highly non -significant factors. We can see in our regression model, in the Logistic Regression model, that SCORT is the most important variable. This model has similar misclassification rate of three, the same as the Logistic Regression. Here is how the seven indicator variables are used to predict the probability of lameness. What we like about the regression type of models is that we can get odds ratios that will help us to interpret. For example, for our most important variable, we can see that the odds of lameness is twice with one unit of increase is serum cortisol. Overall, we would like to say that the Logistic Regression model only used seven out of the nine indicators. The Neural Boosted model and the Generalized Regression, both of them used all nine indicators for lameness. All of the models have area under the curve of greater than 0.9. All of the models have a lower… The regression type of models had the 7 % misclassification on the validation set, and the Neural Boosted did not have anything. We can compare our three winner models using the Model Comparison platform in JM P. We can see that in terms of predicting, strictly predicting, the Neural Boosted model is significantly better than both of the Generalized Regression and the regression model. The two regression models , the one with all nine variables versus the one with the seven variables, are not significantly different from each other. They all had a very similar area under the curve, and they had similar misclassification of three birds. This is our presentation. We'd like to thank you for your attention. We have some references that you can find similar related material to the techniques that we used in JMP documentation that will help you through this process. Thank you for your attention.

0 attendees

0

Event has ended