Choose Language Hide Translation Bar

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Thomas Walk, Large Plant Breeding Pipeline DB Manger, North Dakota State University Ana Heilman Morales, Large Plant Breeding Pipeline DB Manger, North Dakota State University Didier Murillo, Data Analyst, North Dakota State University Richard Horsley, Head of the Department of Plant Sciences, North Dakota State University   Crop breeders, often managing numerous experiments involving thousands of experimental breeding lines grown at multiple locations over many years, have developed valuable data management and analysis tools. Here, we report on more efficient crop evaluation with a suite of tools integrated into the JMP add-in dubbed AgQHub. This add-in provides an interface for users to first query MS SQL Server databases, and then calculate best linear unbiased predictors (BLUPs) of crop performance through the mixed model features of JMP. Then, to further assist in selection processes, users can sort and filter data within the add-in, with filtered data available for building reports in an interactive dashboard. Within the dashboard, users segregate selected crop genotypes into test and check categories. Separate variety release tables are automatically generated for each test line in head-to-head comparisons with selected check varieties. The dashboard also provides users the option to produce figures for quickly comparing results across tested lines and multiple traits. The tables and figures produced in the dashboard can be output to files that users can readily incorporate into variety release documentation. In short, AgQHub is a one-stop add-in that crop breeders can use to query databases, calculate BLUPs, and generate report tables and figures.     Auto-generated transcript...   Speaker Transcript Curt Hinrichs Alright, Tom Walk, with Anna and Didier with their poster on AG.Q.Hub. Tom, take it away. Tom Walk Thank you so much, Curt, and thank you to the JMP community for inviting us to this presentation. We're so glad to be here to show you our work. Today we're going to talk about a tool we're building at North Dakota State University in the Department of Plant Sciences called AG.Q.Hub. Its primarily the work of our team, the plant breeding database management team, of myself and Anna and Didier and Rich Horsley who's the department chair of plant sciences. And what we've done is we're trying to help the plant breeders who've had this long established cycle. You can imagine that if you want to improve crops, it's going to take a long process. If you want to do it right and consistently, you have to have set up a lot of experiments. You're not just going to get lucky very often and choose the best crop. So what you have to do is, you have to set up a lot of crosses and a lot of trials with thousands of lines initially, and from those, you have to go through this decade-long cycle or more, and choose...make choices every year about which were the best lines to advance. and this is...all would change with environmental conditions, so we have to get that right combination of genes with the environment. And you have to have the right analyses and experimental designs to do that, so this gets very complicated for a plant breeding team. And it's a long process to make any variety selections. And what we want to do is to make the selection processes easier every year. It'd be nice to shorten this whole process, but our more immediate goal is to to make the process more efficient, the selections more efficient at each stage. And what we've done for that, is we're developing this tool AG.Q.Hub, and we were using that with our breeding programs. We have 10 breeding programs within plant sciences at NDSU and over 60 users. We've incorporated also two research extension centers with more variety trials and field sites. And this is...this list is growing. We're trying to add more users and will probably add in more research extension centers. So what what's nice about AG.Q.Hub and the reasons why we have these users is because we have the functionality at AG.Q.Hub and it allows you to connect to the database directly and you can see data from decades worth of experiments. And once you have that you can do this analysis. You can look at experimental designs, you can view the histories of your varieties, you could look at distributions of data for individual experiments, you can calculate blups and make those predicted values. And once you have those predicted values, you can get those head-to-head comparisons, and if you have those compare...once you have those head-to-heads, then we can start building reports. And that's where we're going to get into is, we want to be able to make reports with subsetting data and build tables and the visualizations that make it a lot easier for our users. And all this is going to be done within seconds, with a few clicks in AG.Q.Hub and it's saving what's up to weeks of time in the past where users using spreadsheets and workbooks. So just to give you an idea of a workflow in AG.Q.Hub, here's one cycle of generating data for reports. And so what users will go into AG.Q.Hub, they'll select a database they want to use, and what type of analysis or query they want to have, and the output they want to have. And once they...then they'll click start and then after that, another window will pop open that will prompt them for the parameters for the queries, such as the experiments they want to query, the years, the traits or treatments that they want to look at. And after they select their parameters and click OK, then the data will pop up in these data tables and the data tables are in... within the AG.Q.Hub add-in, so all the data tables are compiled nicely within these tabs in here in AG.Q.Hub. And then here are some of the newest features we have is that users can fill...they can select the varieties they're interested in, and they can sort, and they could select, and then they could do some filtering, and then they can make these filtered variety tables. And with with these filtered variety tables, they can export those into their reports into Excel or other documents. And once you're done with one, you can start over and move on to another set of experiments. And click cancel after you do as many analyses as you wish to. We're still working on this, it's a work in progress and we get a lot of great ideas from our users. What we want to do... we're always...we're always expanding this to more users and research extension centers. That's been helpful for us to build this up. As we do that, we're looking to compile templates and release...of release tables used by the programs. With that we can build up some sort of output tables that make it easier for the users to produce head-to-head tables and variety release tables. And then we'd also like to make it easier by adding visualizations for making quicker variety comparisons. And Anna has some great ideas with that, with her experience as a plant breeder. Finally, we there's always...we're always looking to make the interface more dynamic with maybe changing options as users click things. And with that, that's my talk, but I would like to start this video, this short video to give you an idea...give you a better idea of what AG.Q.Hub does. And Curt, thank you for this opportunity again. With that, I do have one more thing I'm excited to show you and I'm going to request the share screen. I want to show our users one more thing, and this is the newest things with AG.Q.Hub. This is what we're excited...this is the direction we're going. What we have...what we have here is once the users make their selections and filtering and what they want to make tables.... the varieties they want to make tables with, we're making dashboards that open up and they can select among their filtered varieties, for which ones they want to be check lines. Like, for example, we want to be this historical varieties to be our check lines, and we want to compare our new test lines against those check lines. And we're going to select the different types of traits we want to look at, the traits we've seen in the field versus traits we measure in a laboratory or traits such as disease traits. And then we click make tables and it'll output these tables. I'm not going to do that right now to save you a little time. But what it's going to do is output tables for each of these traits, for each of these varieties, and you can see it's still needs some work. I need to put the names of the varieties in, but it's still...we're working on this, but I'm very excited and I wanted to show you this before I end. And that's the direction we're going and we're going to build on this dashboard to keep making these tables and make outputs for our users so they can put these the better formats for Excel. They can format on further outputs to Excel and Word or whatnot, and build visualizations where we can do head-to-head comparisons by comparing how these...this variety does against this variety. So we're building up on these dashboards. We are very excited about this and I'm so happy to show this and share this with the JMP community. And before I go, I want to thank all the North Dakota State University Anna, Didier, Rich and myself, and all our other users. Thank you so much.  
Monday, October 4, 2021
Caroll Co, Research Scientist, Social & Scientific Systems Helen Cunny, Toxicologist, Division of the National Toxicology Program, NIEHS Keith Shockley, Staff Scientist, Biostatistics & Computational Biology, NIEHS David Umbach, Staff Scientist, Biostatistics & Computational Biology, NIEHS   The dashboard functionality in JMP is a fantastic tool that can serve as a good alternative to R Shiny. Compared to R Shiny, JMP dashboards are easier and faster to build and can save lots of time in app development. In this poster, I talk about how JMP dashboards can be a tool for:          Sharing the excitement of data discovery with others.          Personalizing the presentation of graphics.          Creating a visual storyboard to convey information.     Auto-generated transcript...   Speaker Transcript Caroll Co Hi, everyone. I am Carol Co. I'm a statistician at Social & Scientific systems, and today I'm going to talk to you about   how you could use the dashboard functionality in JMP to showcase your data and to create a visual storyboard.   So, unlike R Shiny, JMP allows you to create an app in a matter of minutes, without having to learn to code.   So you could spend a lot more time in exploring your data and to do your statistical modeling, rather than coding an application from scratch. And I think that's really like the biggest advantage JMP has over R Shiny.   So there are many reasons why an app can be useful. In my case, I was dealing with a complex data set   with too many what-if scenarios and a lot of higher order interactions, and so producing static graphs had become too cumbersome and inefficient to work with.   I was also working with the work group coming from different backgrounds and different technical and computational skills, so I was really looking for a platform where any user from any...coming from any background can find it easy to navigate.   So now I'm going to show you how to build a dashboard using JMP. I'll switch to, like, my JMP data table.   Okay.   So here is a data set...a sample data set that came from a simulation study.   I have eight columns in here, six of which are factors from an experiment and then I have two different responses that were collected. So because I've already spent a lot of time   analyzing this data, I already know, like, some of the data features and what I want to go on my dashboard. So they are   (let me show a panel here) like the distribution of all of my columns.   Because I have two responses, I also wanted to show the relationship between my two responses and, as you can see, one of the...one of the factor in my data, which deals with whether the data is balanced or unbalanced, completely   explains the separation between these two points. These two are features. And then I also wanted to show a graph builder to kind of link the relationship between my response and the different factors in my experiment.   So, to build a dashboard all you need to do is go to file, click on New,   and then choose dashboard. And JMP already has some built-in templates in here that you could use. If none of this really fits what you're looking for, just pick any one of them, because it's really easy to change them.   So I already have kind of like a vision in mind of how I want this dashboard, this application to look like. And say I want the distribution to kind of go up here in the top right,   my scatter plot to go here on the bottom,   and then, this graph builder chart to kind of go like just right next to it.   I also wanted to have a data filter in there, because of that, like, data balance issue that I...that I'm seeing with my scatter plot.   So I forgot to mention that before you do this, like, the open reports that you have   in your JMP session, which show up here as thumbnails in these...in the Left panel here, and so really all you're doing is, like, you're dragging and dropping to arrange...   to have the layout that you want. And once you're happy with that, just click this Run Script icon in here. Just wait a couple seconds and here's your dashboard.   This is basically what I intend my user to see. I don't want them to see the data table. This is...I don't want them to bypass that, I just want them to see this application so that they can start exploring the data.   So for...   for you guys who are familiar with JMP, you know that the...   a lot of, like, the the interactivity, the dynamic linking is preserved and so that's really a great way to kind of showcase your data and tell a story.   In my particular case, because we were dealing with two different responses, maybe one of the questions we're looking at was what are the settings that are giving us the optimal response.   And so, in my case, in this case, it was responses where both response 1 and response 2 are greater than 80%.   So if I just highlight that region, I can kind of like quickly see that most of these observations are coming from the higher sample size and when Factor C is levels A and B and so on.   On the bottom right,   again, because I've only had...I'm only highlighting the observations that are giving me the good response, I can kind of quickly see these three cells on the bottom left, none of them are highlighted so suggesting that the the settings that are in here are all poor choices.   Another way to kind of like look at your data is to use, like, this filtering option. And because of this data feature that I have, I'm going to filter it by the data balance.   And so now, if I just want to look at the end...the unbalanced case first, I can just quickly select that and everything is interactive and responsive.   One more thing that you might be kind of like thinking, what is this stuff that's happening right here? This looks really weird. So   you can kind of quickly highlight that region as well, and you can kind of see that the observations that are coming here are coming from the 100 sample size and when Factor D is Levels 3 and 4, and when Factors C is at E and D.   So that's a really great way to kind of like explore your data.   So as a data analyst,   a lot of the,   like, requests I get would be, like you know, changing the aesthetics of the graph.   So, usually it would be somebody who doesn't really like my color scheme or like the line type that I chose, or even the arrangement of some of...because I have like, 1, 2, 3, 4...4 different factors in this plot.   What's so great about presenting your data this way is that the user has complete autonomy in making those changes themselves, just by right clicking. So if you're familiar with Graph Builder, you know this, you can change the colors,   the style and the width, but this...whenever I show this to other people, where you can, like, swap variables, so say if I want to swap this Factor E with...   let's say, with sample size, you can get a completely different picture. And whenever I show this to people they're just like so stunned, because this is such a great tool to showcase your data.   So if you're happy with the layout of this, I would suggest you just like hit this red triangle button up here.   And then just save your script to your data table. Or if you're realizing there's something in here that you're not...you don't really like, you want to make changes, or you want to make it   kind of more statistically savvy, maybe add a fit model...results from a fit model, you can go back to edit dashboard.   And that'll take you back to your dashboard builder, where you can, like, take out some of these containers and add new ones or totally recreate it. Or you can have multiple versions of the same data set to cater to different types of audiences.   So in conclusion, I think displaying your data in this way can be a really powerful tool to communicate your findings to your audience and   your users, because of, like, the ease of use. Your use...your users   can play with the application without even needing to touch the data table or needing to code anything.
Isaac Himanga, Senior Analytics Engineer, ADM   Manufacturing processes are highly correlated systems that are difficult to optimize. This correlation provides an opportunity to identify outliers. Here, we demonstrate an entire workflow, which involves:   Obtaining data from the OSISoft historian. Modeling and optimizing with multivariate models that account for process constraints. Using the MDMCC platform. Scoring models online and writing abnormal conditions back to the historian. In practice, many processes are hard to optimize. Sometimes a system cannot be reduced to a set of independent variables that can be tested; in other cases, the process can become unstable under different conditions. To address these issues, we are using new features in JMP 16 to optimize for cost and quality in the latent variable space, while accounting for constraints on observed variables. In order to remain at those optimal conditions, we use the MDMCC platform to quickly identify deviations from normal.   Add-ins are used to minimize the time spent gathering data and scoring models: one to pull process data into JMP from the popular PI historian, one to quickly recreate calculated columns, and a third to score models online and send results to the historian, including contributions from the MDMCC.   Add-ins referenced are available on the JMP Community: Aveva/OSISoft PI Tools  Scripting Tools      Auto-generated transcript...   Speaker Transcript Isaac Himanga Hello good morning. Janice LeBeau, JMP hi Isaac how are you. Isaac Himanga doing well, how are you. Janice LeBeau, JMP Fine, thank you I'm Dan it's about I'm going to be producing your presentation today for you. Isaac Himanga Right fantastic. Janice LeBeau, JMP And I can't see you Isaac. Isaac Himanga yep I'm sorry about that I'm getting it set up here. Janice LeBeau, JMP Okay, and just to let you know we are recording right now so.   Absolutely cool I feel like we've met before. Isaac Himanga We might have either I'm sure, one of the summit's if it was anywhere. Janice LeBeau, JMP Okay, great well how you doing today. Isaac Himanga doing well, how are you. Janice LeBeau, JMP Good good I love your background that's a beautiful color you've gone on for sure. Isaac Himanga Thank you.   Good. Janice LeBeau, JMP I just am going to go over a few things and that I just need to reconfirm with you.   Your presenter name company and abstract title.   Okay, do you have any co presenters okay all right and you're okay about us recording this and posting it to the discovery discovery summit archives. Isaac Himanga Just absolutely and I guess, I was going to post play into record just my end on my site as well, just in case something happens so we'll have a copy of it and that's all right with you. Janice LeBeau, JMP um I don't know if that will affect the quality of anything or. Isaac Himanga shouldn't I do it pretty often. Janice LeBeau, JMP Okay I'll let you take the lead on that I'm your background looks good your microphone sounds good okay just wanted to reconfirm your cell phone and computer notifications are off. Isaac Himanga should be but I'm gonna check again just because when. Janice LeBeau, JMP I call your artwork to.   And then.   Energy of eight and 1920 by 10 day, do you have any pets in the background. Isaac Himanga So. Janice LeBeau, JMP It could make noise, like my cat probably will, so I referred him. Isaac Himanga Well, in theory, I'm the only one home today so. Janice LeBeau, JMP Did you close out look Skype or any other applications that make noises. Isaac Himanga Accessing my turn this one off right yes.   Okay. Isaac Himanga yeah, so I think the only thing that should be on as you and I'm actually going to turn your volume way down when we get started here, just in case something happens I'll be able to see you but it'll just be quiet that way if anything else does make noise oh. Janice LeBeau, JMP Well, what I'm going to do is.   I'm just going to give you a couple more pointers here I'm going to mute myself and hide my video you can share your screen for your presentation.   And, before I mute myself I'll do my 54321 and then give yourself a couple of seconds, and then you can go ahead and introduce yourself.   and start recording you can start your presentation and then, when you just tell me well this concludes today's presentation, I hope you enjoyed it whatever you want to say and then I'll stop recording for you okay.   Good and if we're into it, at the very beginning and you're not happy, you know we can stop it and start over again so I'm totally cool with all of that. Isaac Himanga Okay now hopefully it won't be an issue how close to 35 Minutes do you want it, I when I practiced mean it was torn between like 34 and 36 is that do you want to read it 35 or. Janice LeBeau, JMP You know it's fine I mean just do whatever you have to do I know one person we were at 20 minutes one if they told you what 30 to 40 minutes long. Isaac Himanga yeah somewhere, I remember 35 they might have given a range there. Janice LeBeau, JMP that's cool don't worry about it. Isaac Himanga Okay, good deal perfect. Janice LeBeau, JMP Okay, so we are recording in the cloud.   We can see that.   Oh, that looks very good that looks yummy. Isaac Himanga perfect. Janice LeBeau, JMP So yeah.   So, are you ready my video. Isaac Himanga Of what about let's see so you don't have a timer cameras going to set a timer on my up there, so I could do 30. Janice LeBeau, JMP I mean do whatever you want, because this will start.   I mean, honestly, if you think it's 35 and you go to 32 that's fine if you go a little bit over that's fine.   Okay. Isaac Himanga So I got everything here.   All right. Janice LeBeau, JMP Okay, good so you feel good because I'm going to do the.   I'll do the 54321 and mute myself.   And let's see you froze there you go Okay, so you ready. Isaac Himanga All right. Janice LeBeau, JMP I think, by 4321. Isaac Himanga My name is Isaac Himanga and I'm going to demonstrate a workflow I use at ADM to optimize manufacturing processes using multivariate models.   I'll start with pulling data and building a model, finding realistic optimal conditions, identify abnormal conditions, and finally score the model using current data.   There's a lot of other information on the individual platforms here, so instead of discussing the details of each step, I'll only highlight a few commonly used features and instead try to show the whole workflow.   I will say most analyses with the amount of detail this one has take a little longer than 45 minutes.   So head over to the article for this talk in the JMP Community for more detail and information, including a journal with screenshots of steps I'll move through pretty quickly here.   I'll start with a brief overview of ADM and the general workflow.   Then I'll put this presentation aside to show you the process in JMP. You'll see what it looks like to get and clean data, use the profiler to find out multiple conditions,   use the model driven multivariate control chart platform, write a script to score new data against that model, and finally, I'll return to this presentation to briefly give one method to continuously score that model using JMP.   First, a little about ADM.   ADM's purpose is to unlock the power of nature to enrich the quality of life.   We transform natural products into a complete portfolio of ingredients and flavors for foods and beverages, supplements, nutrition for pets and livestock,   and more. And with an array of unparalleled capabilities across every part of the global food chain, we give our customers and edge in solving the global challenges of today and tomorrow.   One of those capabilities is using data analytics to improve our manufacturing processes, including the method I'm about to talk about.   I am part of the relatively new focused improvement and analytics center of excellence.   And our growing team is invested in techniques, like this one, to help our 800 facilities, 300 food and feed processing locations, and the rest of our company around the world make better decisions using our data.   Now, an overview of the workflow.   The four steps I'll review today only represent part of the complete analysis. In the interest of time, I'm going to omit some things which I consider critical for every data set,   like visualizing data, using validation, variable reduction, corroborating findings with other models, and aligning lab and process data.   getting data, building a model, scoring that model on demand in JMP, and then scoring the model continuously.   JMP has tools to support each step, including an array of database connections, multivariate modeling modeling tools, like partial least squares, principal components, and the model driven multivariate control chart platform, and of course, the JMP scripting language or JSL.   Let's start with getting data. Despite the many database connections and scripting options available, we needed a quick way to pull data from our process historian,   a popular product called PI, without writing queries or navigating table structures. A custom add-in was the answer for most data sets. This add-in was relatively...was recently posted to the JMP Community.   Two more add-ins assist in this process. One, generically called scripting tools, includes an option to quickly get the script to recreate calculated columns   when combined with the save script to functionality built into most JMP platforms. Analyses can be recreated quickly and scored on demand by a JMP user.   The last add-in, called the JMP model engine, is also the newest. It uses a configuration file and information saved to the data table from the PItool's add-in to get data.   That makes calculations using column formulas or any other JSL script and then writes results back to the historian.   And the interest of time, I'm going to move very quickly through this last step, but again, I encourage you to look for more details on the JMP Community using the link on the   agenda slide of this presentation.   Each of these add-ins were overhauled to remove sensitive information but we're shifting users to the same code that's posted on the Community. So if you have ideas and how to improve them, feel free to collaborate with us on the JMP Community over...or over on GitHub.   With that, let's open JMP.   Behind the scenes, the ADM...or the JMP PItool's add-in uses a couple different PIproducts to pull data, including SQL DAS, OLEDB and ODBC. The instructions to set this up can be found in the help menu for this platform.   Today we're going to pull data for a set of PItags from May 1, starting at noon, and then we're going to pull another value every day at noon until today.   We're going to do this for a certain set of tags, those are listed here in this...   in this box. Notice we've actually included what's called a friendly tag or a name that's more descriptive than just the name that's in the PIhistorian that will help us identify those columns later in the data table. There's all these little question marks around the   ...around the add-in giving more information, including a description for this friendly tag format that we're going to use today.   When I hit run query, it's going to open a data table that looks like this one. It's got a set of columns for all of the numerical values in that data table.   It's got a set of columns for any string value, so if you had characters stored in the PItag, it will pull those. And we can also see the status for that PItag   for each row.   Also in the column properties for that data table, if we open the column info, we're going to see information that was used to pull that data point, including the PItag,   the call type and the interval. For more complex queries, like an average or a maximum, we're going to see more information here.   I will note that this is real data that's been rescaled and the column names have been changed, but the analysis that we're going to see today should look and feel very similar to a real analysis using actual data from one of our facilities.   Finally, I'll point out that there's a script here saved to the data table called PI data source, which is the same script that's actually shown in the add-in   and it contains the SQL that's also available here. Again behind the scenes, this uses   SQL DAS or PI DAS in order to get that information from PI, and this is all the...all the scripts that it's going to run to get that data. We're going to come back and use this again near the end of the talk today.   Okay, now that we've got data, we need to clean that data up. We're going to use multivariate tools today to do that, specifically the principal component analysis. I'll pull all of the numerical values from that data table and put it into the y columns list and then   right away, we can see some values that have...quite a few values that have particularly low scores for component one. If you look at the loadings for those   different factors, we can see the component one includes high loadings for all of the different flows in this data set.   So that tells me that all of these values over here on the left have high flows across the board for the whole system. Using some engineering knowledge, I can say that this represents downtime for that process, so I'm going to go ahead and hide and exclude all of these values.   Now that we've done that, we'll recalculate the principal component analysis, so we'll use redo and redo the analysis and then close the old one.   And now we can see the loadings are perhaps a little bit more meaningful. Principal component three, for example, explains most of the variation in flow 2 and there's a little bit of separation here.   The first three components explain the majority of the variation, so I'm going to use those three components when looking for other outliers in this data set.   To do that, I'll open the outlier analysis and I'll change the number of components to be three.   And then we can see both the T squared plot, and I can also open the normalized DModX plot   to see points that either have extreme values for one or more of the scores or points that have a single   column or a single cell that has a value that's unexpected or outside of the expected range, based on all the other columns in that data set for that particular role.   For now, we're just going to quickly clean this data set by excluding all of the T squared and DModX values that are above the upper control limit.   One more thing that's commonly done when cleaning a data set is transforming columns, and I want to show a feature of the scripting tool add-in that makes   it a little bit even trying to apply and   transfer to the new formula column menu. If I select three columns or any number of columns and go to the custom transformation option, which is again loaded as part of that scripting tools add-in,   I can select a Savitzkey-Golay transformation and hit OK, and it will add three columns to the end of the data table with a formula containing that transformation.   I will note that the cleaning we did could have been done directly and in PLS. I often use a PCA first, though.   Okay now we've cleaned our data set, we need to actually build a model   to try and predict that...our process conditions. Maybe another quick note about this data set, we have a flow target up here.   Today, our goal is going to be to create a certain amount of this target flow using as little of flow one as possible and also taking into some...   into account some constraints on these quality parameters. So because flow one is what we're primarily interested in, I'm going to switch over to a partial least squares model and use that flow...target flow in the y and all the other variables as X factors.   I'll just accept all the defaults for now and I'm going to jump ahead a little bit and right away use four factors.   When I open those four factors,   we'll see that the first two   represent the   variables that the plant normally changes in orders...in order to adjust how fast they run the process. So if they need to make more or less of this target flow, they often change factor one in order to achieve that that target rate.   Factor 2, on the other hand, relates primarily to these quality parameters, which are actually input quality parameters that we don't have control over.   So it's not something that we can change. So even though factors three and four have relatively...explain relatively small amounts of the variation of   our target flow and they explain relatively small amounts of the variation of our factor one, those are the ones that we actually have control over and so those are the ones that we're going to be able to use in order to optimize our process.   In order to use this...so we've we've built a model that explains the variation in our data.   In order to use that information, we need to save some of those columns or save the predictions from this to new columns in our data set   that we're going to use in the prediction profiler in just a few minutes.   We'll make use of a few new features that were added in JMP 16, allowing us to save predictions, both for the Y and the X variables, as X score formulas. And when we open the profiler, I think it'll help to illustrate why that becomes important.   So we've saved all three of the predictions, the X predictions and the T squares, back to our data table. Those should be new columns at the bottom as functions of these X scores.   We can also take a quick look at the distance and T squared plots within the PLS platform, and we see that while there's a few variables that have pretty high DModXs or T squareds, there's nothing too extreme.   These scores are often saved or are always saved with variable names that can become confusing as you save more and more or go through multiple iterations of your model. So the scripting tools contains another function, called the rename columns,   which will allow you to select an example query for PLS. It has a pretty complicated regular expression pattern here,   but notice it outputs a standard set of names that are going to look the same for PLS, PCA and other platforms within the data table.   So in this case I'm actually going to copy a specific prefix, we'll put before all of those columns indicating that this is the model we're going to put online for our quality control and it's revision one of that model.   When I change names, we can see it's it's automatically changed the name for all of these columns in the data set.   So we've built a model explaining the variation in   these columns, but what we haven't done is our original goal of figuring out how to produce a certain amount of flow...of our target flow using as little of flow one as possible. To do that we're going to use the profiler. Notice when we open the profiler,   we can add all of these predicted values, so not the X scores themselves, but the predicted values and the T squared, to this prediction formula section.   And then, when it opens up, we'll see across the X axis, all of our scores in the model or our latent variables.   And we can see when we move one of those scores, it's going to automatically move all of these observed variables together   at the ratio of the loadings in each one of those components. So importantly, take a look at these flows three and four, they always move together. No matter which score we move,   this model understands that these two scores are related. Perhaps one is a flow meter that they have control over, and perhaps a second one   is a second flow meter on that same line, but regardless it doesn't look like the plant is able to move one without moving the other one in the same direction.   So the goal is to find values for each one of these scores that are going to optimize our process.   Before we can do that, we need to tell JMP what it is we're trying to optimize. We need to say that we have a certain flow rate we're trying to hit and certain quality parameters that we're trying to hit.   So we're going to start by telling it that we don't care about the values for most of these columns. So we'll select all of the columns that we saved,   we'll go to standardize attributes, and we're going to add a response limit of none for all of these different columns.   Then we'll go back and we'll adjust that response limit. It can be more descriptive for the columns that we do care about. For example, the flow target will go back to response limit and we'll indicate that we want to match a value of 140 for that column.   Similarly for quality one, we want to hit the value of   20.15.   For the flow one, we want to minimize that value.   And finally, we need to make sure that the solution that we come to is going to be close to the normal operating range of the system, so we don't want to extrapolate outside the area where we built that PLS.   To do that, we'll use this T squared column, and we'll indicate that we want to minimize the T squared   such that it's below the upper control limit that was shown in that PLS platform. Here we can see the upper control limit was 9.32, so we'll use that as the minimum value here.   What we should see in the profiler now is every value below 9.32 is equally acceptable and, as you go above 9.32, it's going to be less and less desirable.   Under the graph menu I'll open the profiler once again, and once again take all of those predictions and T squared formulas and put them into the Y prediction formula.   And we still see the X scores across the bottom, but now we also see these desirability functions at the right.   And once again the desirability function for T squared starts to drop off above 9.32. The desirability is highest at low values of flow one and we've got targets for both the the target flow and the quality parameter.   Because we've defined all of these, we can now go to maximize desirability.   And that's going to try and find values for each one of those scores, and thus, values for all of the observed variables in our data set that are going to   achieve this target flow and achieve the targets that are in these that we defined earlier.   Notice it came close to hitting the full target, but it looks like it's a little bit low. It did achieve our 20.15   and it was within the T squared bound that we gave it. Most likely JMP thought that this tiny little decrease in desirability was less important than reducing flow, so we can fix that by control clicking in this desirability function and just changing the importance to 10.   Now, if we optimize this again, it should get a little bit closer to the flow target. Before we do that, though, I'm going to save these factor settings to this table at the bottom, so we can compare before and after. And then we'll go ahead and maximize desirability.   Looks like it's finished and we're still within bounds on our T squared.   It has achieved the 20.15 that we asked and it's certainly much closer to 140. So now we could once again save these factor settings to the data table.   And we should now have factors that we can give back to the manufacturing facility and say, hey, here are the parameters that we recommend.   The benefit of using a multivariate analysis like this, we talked about those flows three and four being related earlier,   using this method, we should be able to give the plant reasonable values that they're actually able to run at.   If you tell them to run a high value for three and a low value for flow four, they might look at you and say well that's just not possible.   These should be much more reasonable. Note that not all of these variables are necessarily independent variables that the plant can control. Some of those might be   outcomes, or they might be just related to other variables. In theory, if the plant changes all of the things that they do have control over, the other things should just fall into place.   So now we've optimized this model, the next step is often to make sure, or to verify, are we running normally. So once the optimal conditions are are put in there, it's good to use the same model to score new rows or new data and understand, is the process acting as we expect it to?   To do that, we'll use these same prediction scores that we had in...saved from the PLS platform, but this time we're going to use those in the model driven multivariate control chart   under quality and process.   I'm going to use the time and put that in time ID, so instead of seeing row numbers, we're going to see an actual date on all of our charts and we'll put the X scores into the process.   Unfortunately, the range doesn't always work correctly on these charts, so if I just zoom in a little bit, we'll see that here are periods where we had a high T squared and that high T squared was mostly the result of flows two, three, four and one, so all flows in that...   are the high contributors to this T squared. If we click on that and hover the mouse over one of those bars, if I click on that point again, then we'll see a control chart for that individual value...individual   variable with the point that we selected highlighted on that chart.   And I'm going to hide this one, though, and not only look at T squared, we also can do the same thing for DModX or SPE, if you're still looking at SPE. Once again,   this doesn't always work out correctly.   So we'll zoom in on the   DModX values again.   So DModX is going to indicate points that have a high or low value for an individual column, compared to what was expected based on the other rows, the other data in that row.   Here we can see that this point is primarily an outlier due to flow five.   I do find the contribution proportion heat maps to be pretty useful to look and see patterns in old data when, for example, one variable might have been a high contributing factor or acting abnormal for a long period of time or for some some section of time.   So this is a chart that might...that we might want to look at every morning, for example, or periodically to see, is the process acting normally?   You come in, you want to open this up and see, is there anything that I should adjust right now in order to bring our process back under control?   So to do that, we want to recreate this whole analysis from pulling PI data to opening the MDMCC platform and have it all be available at a click of a button.   To do that, we're going to write a JSL script that has three steps. It's going to get the data from PI, we're going to add the calculated columns, and then open the model we have in the multivariate control chart platform.   getting data from PI. If we go back to the original data table that we...   that was opened after the PI tools add-in was run, we can see this PI data source script saved to that table. If we edit that script and just copy it, we can paste that into the new script window.   I'm just going to make one change. Instead of pulling data from May, we'll start pulling data from August 1 instead.   Now we need to add those calculated columns. So remember we...in the PLS platform we use the saved score...save as X score formulas option.   In order to recreate those, we can just select all of the columns in the data table and use the copy column script function that was added again in that scripting tools add-in.   Once we copy the column script, we go back into this new script that we created and we'll paste what are a bunch of new column formulas to again recreate all of those columns.   Finally, model driven multivariate control   chart has an ??? most other platforms, where you can save the script to the clipboard and you can paste that into the same script window.   Now, if we imagine it's a new day and we have everything closed, and I want to look at how our process is doing, I would just run this same script. Note that I could start it with   a specific slash slash and then an exclamation point in order to run automatically when the script is opened.   When I hit run, it's going to open a data table that looks just the same as our original table. It's got all of the same columns.   It's added those calculated columns, so let's put these scores in and all the predictions, and it also opened the model driven multivariate control chart platform   where we can see that this recent value for DModX is actually pretty high, so the most recent data has some abnormal values, in this case, for quality three.   So again, quality three looks like it's not at the expected value, based on the other parameters. In this particular case, that might mean,   since quality three is an input parameter and quality one, two and three are often related, that might mean that quality three is a bad lab sample or it could mean that this is a new material that we haven't worked with before.   Okay, finally, let's talk about one method to run this model continuously. So this was recreated on demand,   where we wrote a JSL script to run this, but sometimes it's beneficial   to...or we found it beneficial to write these results back to our PI historian so that they can be used by   the operators at our facilities. So in the last couple of minutes, I want to introduce that add-in, it's called the model engine add-in,   which will quickly score models online. I should note that this should be used for advice only. It's not an acceptable method to provide critical information or for closed loop control.   For that you might consider exporting the formulas from the formula depot and using some other software package to score them.   As mentioned earlier, some of the predictions and information available in this model   has most has the most value at the moment it's happening. So knowing what caused yesterday's problem is great, but knowing what's happening right now   means making decisions with current model results and it allows some problems to be fixed before they become a big deal.   Of course there's many ways to score models, but the power of JMP scripting language, JSL,   provides a way to get predictions and anomaly information in front of operators at our manufacturing facilities using their existing suite of visualization and trending tools that they're already used to.   A pair of model engines, or computers running JMP with a specific add-in that started the task scheduler, are set up to periodically run all the models stored in a specific directory.   All the configuration is done via two types of configuration files, a single engine configuration file and one model config file for each model that's going to be scored.   Let's start with that model config file. Remember how the PI tools add-in saves source information to the data table?   Now that same information can be used to populate the model config file, which tells the model engine how to download a table with a single row containing all of the model inputs that it needs to calculate values from.   Later, the scripting tools add-in quickly save the scripts to recreate columns saved from the PLS and any other platform,   potentially including T squared and DModX contributions that can be saved from the model version control chart platform. These new column scripts are also saved in the model config file or in a separate file in that same directory.   Finally, the engine config file defines how the engine communicates with the data source where PI tools   add-in...where the PI tools add-in uses the OLDB   and SQL queries to get data, the model engine uses the PI web API to read and write data directly to PI.   By defining a set of functions in the engine config file, this engine can communicate with many other data sources as well.   Notice a set of heartbeat tags are defined, which allows the data source and other model engines to know the status of this engine.   Each model also has its own set of heartbeat tags, so if one machine stops scoring a particular model, the other engine will automatically take over.   Again this model engine idea is not intended to be used for critical applications, but I found that it allows us to move very quickly from deployment and exploratory analysis to an online prediction or a quality control solution.   With that, thank you all for attending. Remember that more information on each add-in and the journal I use today are available in the JMP Community. Janice LeBeau, JMP awesome job awesome.
Aishwarya Krishna Prasad, Student, Singapore Management University Ruiyun Yan, Student, Singapore Management University Linli Zhong, Student, Singapore Management University Prof Kam Tin Seong, Singapore Management University   There are several reasons for a flight to be delayed, such as air system issues, weather, airline delays, security issues, and so on. But interestingly, the most frequent reason for a flight delay is not about weather but about air system issues. The Federal Aviation Administration (FAA) usually considers a flight to be delayed when it is 15 minutes or more late in arriving or departing than its scheduled time. Flight delays are inconvenient for both airlines and customers. This paper employs dynamic time warping (DTW) techniques for 54 airports in the US. The study aims to cluster airports with similar delay patterns over time. In addition, the paper builds some explanatory models to explain the similarity between different airports or distances. In this analysis, we aim to use the time-series techniques to discover the similarity in the top 15% busiest American airports. This paper first filters the top 15% busiest American airports and calculates the departure delay rate for each airport and then uses DTW to cluster these airports based on departure similarities. Next, the similarities and differences between clusters are identified. This analysis will help inform passengers and airport officials about departure delays at 54 American airports from January to June 2020.      Auto-generated transcript...   Speaker Transcript ZHONG, LINLI _ Okay let's get started. Hi, everyone. This is the poster of time series data analysis of flight delay in the US airports from January 2020 to June 2020. We are students of Singapore Management University. I'm Linli. YAN Ruiyun I'm Ruiyun. Aishwarya KRISHNA PRASAD And I am Aishwarya Krishna Prasad. Now let's quickly dive in to the introduction of our project. Over to you Linli. ZHONG, LINLI _ Thank you, Ash.   In the left hand side, we can see that there is a line chart. This shows the annual passenger traffic at top 10 busiest US airport and...   in in the...from the graph, we can see that the number of the passengers in each airport experienced a sharp drop. This is because the passengers in airports showed the response to the spread of the COVID-19 in 2020.   And for our analysis, we would like to discover the delay similarity of top 15% of airports in America from features of the delay and geographic location.   time series, dynamic time wrapping, exploratory data analysis.   The time series and DTW are employed to find out the similarities between the clusters, based on the departure delays. EDA is used to draw the geographic map. Okay, let's go back to the data set.   Thank you, this is the data set.   Actually, our data set comes from the United States of Department of Transportation and from our   data preparation in the left hand side, this is the process of our data preparation. We firstly imported the csv file into JMP Pro 16.0.   And then we remove the columns and values which are not really useful for our analysis.   And after that, for the data transformation, we summarize the data for airports from different cities, and then we filter out the   top 54 airports, which is based on the total number of the fights in each airports and calculate the rate of the delay.   And after the data preparation, we save this file as SAS format and we import the SAS format into the SAS Enterprise Miner 40.1 for our further analysis,   namely the DTW analysis and time series analysis. After DTW process JMP Pro 16.0 was used again by finding out the singularity of different clusters and draw geographic maps.   And this is the introduction for data set. Let's welcome my partner to introduce more about our analysis. Aishwarya KRISHNA PRASAD Thank you, Linli. Now let's dive into the time series and cluster analysis. So we did the time series and cluster analysis using the SAS Enterprise Miner.   So this graph is one of the outputs that we obtained using the DTW nodes in SAS Enterprise Miner. So in the X axis, you can see that, you know, it contains the months from January 2020 to June...to July 2020.   And in the y axis, we can observe that there's a percentage of delays in the flights that we have included in our data set.   Now we can see that there is a sharp spike in the early February and in the late June, which seems to be strongly correlated with around the holiday periods of USA.   But, in general, other than these two spikes...major spikes, we can also see a steady decrease in the number of flight delays in general.   We then performed a time series clustering based on hierarchical clustering and the constellation plot of the same can be observed over here, using SAS. And we chose that...we felt that the number of clusters (7) is the most optimal number of clusters for our analysis.   Now, these are the clusters that are formed by using the TS similarity node of the SAS Enterprise Miner, so let's just take a...quickly take   the instance of Cluster1. So in this Cluster1, it contains mostly the international airports in the US.   So some of these airports are the Denver international airport, the Kansas City international airport, the Washington international airport,   just to name a few. So the delay in these airports are pretty large, as you can see, and this can be attributed to, you know, because this is located in the city that is frequented by tourists.   So similarly, the remaining clusters are formed by this similar behavior of the delays that are experienced in the flights.   Now the clusters that were generated in the previous step was then fed into the JMP Pro, and using the Graph Builder functionality,   we were able to build these graphs. So this graph contains the causes of delays in each of the clusters. So in over here, we can clearly see   which causes of delay is more prominent in each cluster. So for example, as you can see for Cluster1,   the late aircraft delay, that is, the delay caused by the previous flight to the current flight is more prominent compared to the rest. And the same queue follows for the rest of the clusters.   But if you see this cluster, right, so although this visualization in SAS is pretty intuitive,   we felt like for a...for a data set with large number of points, or more number of airports in our case, it would be quite difficult to analyze. So I'm just calling upon my peer Ruiyun to present another approach to analyze the clusters. Over to you, Ruiyun. YAN Ruiyun Okay, geographic location is another part that we focused on. The clusters were formed in SAS then we used Graph Builder feature in JMP Pro 16.0 to generate this map to   show where the different airports are located by cluster. Obviously airports from western and middle US are only included in Cluster 1 and Cluster 3.   And these two clusters show that cluster is not distributed in a specific region.   Cluster 2, Cluster 4, Cluster 5 and Cluster 6 demonstrate an aggregation of airports with specific region.   Airports from Cluster 2 are mainly concentrated in eastern United States, while the Cluster 5 and 6 are more likely contain the airports of some   tourist attractions, such as Houston, Phoenix, Baltimore, and Honolulu, which are the largest cities of Texas, Arizona, Maryland and Hawaii.   Even more to the point, Phoenix Sky Harbor International Airport is the backbone of national airlines and southwest airlines. That's   one of the key transportation hubs in the southwest America. In addition Cluster 7 is a particular case, as it just has one airport, San Juan airport from Puerto Rico.   We surmised that because of the special geographical location, any flight departing from San Juan airport has a long distance to travel.   And that's all about the geographical analysis and now my partner Aishwarya will give us a conclusion. Aishwarya KRISHNA PRASAD So in conclusion here, we tried exploiting the ease of usage of the DTW nodes in the SAS Enterprise Miner and also the sophisticated visualization and pre processing techniques in JMP Pro 16.0 to perform our time series analysis for our flight data.   So we performed the dynamic time clustering for 54 airports. And these airports were formed into seven clusters, based on the delay patterns during January and June 2020.   We observed that the carrier delay is mostly the main reason for delay in each cluster, while the late aircraft delay is not very far behind on being a major cause of delay in most of the clusters.   As part of the future work, one can include the COVID data points to improve this analysis further and also discuss the correlation between the delay and the cancellation rate of flights.   Thank you so much for listening to us. I hope you liked our presentation.
Suling Lee, SMU, Singapore Management University   COVID-19 vaccines play a critical role in the attempt to assuage the global pandemic that is causing surges of infections and deaths globally. However, the unprecedent rate at which it was developed and administered raised doubts about its safety in the community. Data from the United States Vaccine Adverse Event Reporting System, VAERS, has the potential to help determine if the safety concerns of the vaccines are founded. As such, this paper uses the combination of both structured and unstructured variables from VAERS to model the adverse reactions to COVID-19 vaccines. The severity of the adverse reaction is first derived from the variables describing the vaccine recipient outcome following a reaction from the VAERS data sets. Next, unstructured data in the from of text describing symptoms, medical history, medication, and allergies are converted into a Document Term Matrix and these combined with the structured variables helps to build a model that predicts the severity of the adverse reaction. The explanatory model is built using JMP Pro 16 using Generalized Regression Models and Binary Document Term Matrix (DTM), with the model evaluation based on RSquare value of the validation set. The optimal model is a Generalised Regression model using the Lasso estimation method for Binary DTM. The key determinants contributing to the adverse reaction from the optimal model are number of symptoms, period between vaccination onset, how the vaccine are administered, age of patient, and symptoms related to cardiopulmonary illness.       Auto-generated transcript...   Speaker Transcript Peter Polito How are you doing today. Can you hear me. If you are speaking, I am unable to hear you. Hello. test test. Oh Hello Leo can you hear me. yeah sorry about that I think of the technical difficulties yeah. Peter Polito Oh no problem at all. Oh okay it's a nice Jimmy. Peter Polito Oh, where. Are you calling in from. Singapore. Peter Polito Singapore all right, how how late, is it there. About 909. Peter Polito yeah well. yeah kudos to you for. hanging on. so late thanks for making it. Possible no it's all right um yeah I hope I do a good one. Oh. No i've been like high stress about this Oh well, yeah i'm Okay, let me just put on a virtual background. Okay. Peter Polito And I gotta go through just a couple things on my end before we officially start. Okay. Peter Polito just give me a. minute here. yeah. Peter Polito I only bring in my checklist make sure I do everything correctly here. alright. So just to confirm. You are soothingly. yeah and your talk is titled a model for coven vaccine adverse reaction. Yes, is that correct yeah that's right. Peter Polito All right, and then just to make sure you understand this is being recorded for the use and jump discovery summit conference will be available publicly in the jump user community do you give permission for this recording and use. Yes, okay great. your microphone sounds good, I don't hear any background noises is your cell phone off and all that kind of thing anything that might make some random noises. hang on. i'll send it will find them yeah. Peter Polito Okay, and then. We need to check the can you go and share your screen and we'll go through and check the resolution and a few other things. Okay sure. Peter Polito Thank you. i'm sorry, is it Lisa ling or soothingly. My first name issuing nicely yeah. Peter Polito got it okay. um is it okay. Peter Polito I don't see it yet oh um you know pie covering it just a moment. That looks good. And if you go to the bottom of your screen does your taskbar actually I don't see your taskbar so we're good. Okay. Peter Polito And let me make sure. It are any programs that might create a pop up like outlook or Skype or any of those are those all closed down and quit. um yeah, I think, so I bet the checking on them. close my kitchen. Okay. yeah good. Peter Polito Okay, and then, are you going to be working just from a PowerPoint or you be showing jump as well. I was worried of the transiting, so I will be destiny it from PowerPoint. Peter Polito Okay, great, then I am going to mute and turn off my camera we are already recording so as soon as we. As soon as you see my picture go away go ahead and start and i'm not going to interrupt for any reason and we'll try and go through it's a 30 minute presentation so let's go through, and I won't even be here it'll be like you're talking to yourself. Okay, so that the Minutes right okay. Peter Polito Are you ready to go. yeah. Peter Polito Okay. All right, and it's just so you know when we actually. Have the discovery summit, if you realize tomorrow that you misspoke or you wanted to present something in a slightly different way. You can be live on the when your presentations going, and you can ask the person presenting a deposit and then you can say you know i'm about to say this, what I what I wanted to convey is that you can kind of like. edit in real time during the presentation so don't stress about getting every word perfectly just relax and and go through it and and i'm sure will be just fine. All right. yeah. Peter Polito All right, i'm gonna mute and turn off my camera and then you go ahead and begin okay. Thanks Peter. Hello, I'm Suling. I'm a master's student at Singapore Management University where I'm currently pursuing a course in data analytics at the School of Computing and Information Systems. So I'm actually here today to present an assignment that has been submitted for my master's in IT for Business program and, more importantly, I want to share my JMP journey so far. So I started using JMP this year and I really fell in love with it because of the ease of use and the range of statistical methods and the visualizations that I could do on it. So, as the beginner using JMP I'm really honored to be here presenting my report and do let me have your feedback, because I feel that I have to so much more to learn yeah. So the motivation for my paper was actually to look at the COVID 19 vaccines, so we know how important they are but at the unprecedented rate at which it was developed and administered has raised some doubts in the community regarding its safety. So we are using data from the United States vaccine adverse event reporting system, yes. So we are using data from there because we find that there's a potential to help determine if the safety concerns on the vaccine are founded. So this paper makes use of both the structured as well as the unstructured data from VAERS to model, the adverse reaction of COVID-19 vaccines. So what is VAERS? So the Center of Disease Control and Prevention and the US FDA have had this system, and it is actually a adverse event system where it collects data. But generally what we see is that VAERS data that cannot be used to determine causal links to adverse events because the link between the adverse event and the vaccination is not established. So what we actually see here is that you have people who are reporting, but there, they are people who are reporting the events, but then there is no full of action that is to confirm that these symptoms and events that are reported, are there any link to the vaccine. So why do we still want to use this data? So firstly the data is available and public domain. The data is up-to-date and, more importantly, not all adverse events are likely to be captured during the clinical trials due to low frequency. So usually for clinical trials, they include only the healthy individuals. So special populations, like those with chronic illnesses or pregnant women, these are limited so the they know that VAERS is an important source for vaccine safety. So for more information regarding it, you can look up this link over here. Yeah so the data set used for this study comes from tree data tables that extracted from VAERS. The first one is the VAERS data. It mainly contains information about patients profile and the outcome of adverse events, so what I have here is a little clip from JMP where we have here the symptoms text and you can see that this is just one report based on one person, one one patient. Okay, and the data is quite dirty. There is a lot of useful information in the narrative text, but you can see that there are spelling errors, typos, excessively long or even like a very brief statements. So the next two data sets that we have is the VAERS of vaccination data, as well as the symptoms. So one contains information regarding the vaccine, the other one is extracted from the symptoms text that we can see. Okay so given this accessibility, actually VAERS data has been mined quite a lot by the by quite a number of researchers, but, as you can see that the data is actually very challenging to use as the quality of the report varies. And there's also something that might not be genuine. So review of the power, which shows that some form of manual screening is usually employed to extract the required information. However, this is also quite labor intensive and quite difficult, so this paper aims to showcase the methods to extract the key information using text analysis techniques in JMP and try to do an explanatory model to explain the most important variables involved in this event. So what we did is that for each of these data tables over here we clean them individually and then join them using the VAERS ID. What we did based on the patient outcomes was to derive something that's called a severity rating, I'll talk a little bit about this a bit later. So once the tables are joined there are four narrative texts. One on the allergies, medication, medical history and symptoms. And then we will use text analysis techniques to extract the vectors for the top terms that...will explain the severity rating for each of the text data. And join them in the existing spreadsheet data structured variables on the data set. Okay, and then all this is compiled together and then you put it into model building. Okay so what is this the severity rating all about? It's based on the patient's outcome, the VAERS data has 12 variables that describes the status of the patient. And then, based on this, we have extracted the variables and try to make sense of it, so we came away four levels of severity and then we call this the severity rating. So next we will talk a little bit about how we use JMP Pro Text Explorer platform for text analysis and we start off with the data cleaning. So what we wanted to do was to really extract out the significant terms from the text data. And augment them to the structured variables to build your model. So as you can see, actually, the text data is quite quite messy so what we did was first of all, decide between using the term frequency, what kind of term frequency to use, and then the binary term frequency was selected, as the data shows that there's a significant advantage, of considering, of using it. So next a little bit about the cleaning that came in. So the the text data was first organized using the JMP Pro Text Explorer and we used a useful feature that is in there to add phrases and automatically identify the terms so what you can see it's like terms like white blood cells are kept as a phrase instead of being pulled into white blood and cells, which will not make much sense. And a few other methods as well to use. So one is the standard for combining which stemmed the words based on the word endings and then we also thought to sort the list alphabetically in order to recode like misspelled words or typo errors or what's that similar. Yeah and then the next thing was to use the very handy function to recode all the similar items together yeah. So the next thing we do after cleaning out the text was to look at the was the look at the workflow actually. The workflow is useful for stop word exclusion and to see the effect of the target variable on the terms. So what I did over here was to visualize the most frequent terms by the size and color it based on the severity rating. So you can see that the lighter colors belong to the less severe cases and darker ones are the other most severe ones, and you can like pick up, then the words is quite small. And it really shows that the common symptoms are not serious but we picked up terms by the cerebral vascular incident pulmonary embolism and things like that. So these are related to the most severe adverse event. The next thing we use the term selection, so the term selection is new feature JMP Pro 16 which, which was quite timely. So it is integrates the generalized regression model into text analysis platform so following from the text analysis platform, you can just select this where term selection is. And then, it allows the identification of key terms that are important to the response variable. So our response variable is the severity rating. So why use the generalized regression model? So it is widely used for non normally distributed or highly correlated variables. Where the data are independent of each other and show no other correlation. So this method over here is useful for us because it fits our our data set. And each role that we have inside our data table is a patient and all those are independent of each other. So, and then the most important thing is also that the generalized regression model allows for variable selection, so that is what we want to do because we want to pick up the variables with the highest influence on the response variable. Yeah so a little bit more detail about this regression model there's a few options that we can use over here that's the elastic net, as well as the lasso. So over here are the different thing about these is the lasso tends to select one term from a group of correlated factors, whereas the elastic network net will select a group of terms. So generally, I think that elastic net is used, and then over here that's our choice of the binary term frequency came in. Okay, so this is the result of the term selection, so you can see that over here that shows you the overview of the (???) and then generalized (???) but more interestingly when you started by the coefficient you can see that these are the top positive coefficients So these are top factors that plays the biggest role in terms of our response variable. And this one over here are the symptoms that plays least role when sort that according to the coefficient. So looking at the results, you can see that cardiac arrest and COVID-19 pneumonia, cerebral vascular accidents just all the terms that affects the response variable. So we can see that terms of more serious nature are related to the heart and lungs okay as versus the more low frequency ones right, which are very, very mild symptoms, really. Okay, so we repeat this whole process for all of the other for all of the other text variables, so we have gone to the the example for symptoms, so there's also the allergies, the medical history as well as the medications that are used, so what we did later was to save the document term matrix. Okay, which is basically the DTM is saves a column to the data table for each time. So you can see over here an example, you mainly a lot of zeros because it's a very sparse matrix so one will indicate the presence of let's take this column over here one will will indicate the presence of (???).. So we save it and repeat the process for the other text analysis. And then, once we have all these terms saved up we moved on to modeling. So therefore modeling was to build in kind of like a validation column so over here, we went to predictive modeling and make validation column. So over here we selected the choices so put it as validation set up 55%. And the whole thing over here was to identify the important variables with severity as a response variable So all in all we have seven structure variables and 55 that were derived from document term matrix and a total of about 31,000 rules. And what we see is that there's an imbalance there because of a severity rating you get an unbalanced data set. So because of this, we done our model evaluation on comparing the R split and the AIC values. So. We use the fit model in JMP so and choosing the generalized regression model again. And we can see that these are the results here so separate models you think the group of generalize linear models, using the penalized regression techniques were prepared. And then we try to fit based on the various characteristics over here, these are all the other other the penalized estimation methods. So of all the models, we can see that the lasso method has the lowest sorry has the highest Rsquare value, and there are other values that quite close as well, so we are going to take a closer look at them. So comparing the maximum likelihood model, as well as the lasso model based on the ROC curve, you can see that actually both of the ROC curves are quite similar. And however the ROC curve for the maximum likelihood model shows that it has the highest severity. Sorry, ROC curve for the maximum likelihood model shows that the ROC value is higher for highest severity rating, and you can see that it's only a slight difference here between both of these. And in general as as you go down the severity rating the area actually do increase and one of the reason is because our data set is very unbalanced. So the severity rating of four, which is the highest level, the most of your level is only about 5% of the total data values okay so overall this actually very little difference between both of these so we choose the one with a slightly larger area. Okay, so our next be turned on to the effects test. So into the report, you can choose to see the effects test, so the effects test is the hypothesis test of the null hypothesis that the variable has no effect on the rest. There was this very nice explanation of the effects test on the JMP Community, I think it was contributed by Mark Bailey. So he talks a bit about how the effects test is actually based on the Type III sum of squares for ANOVA. So we can see that the effects test is very suitable over here because of our data set so it actually tests every term in the model in terms of every term in it. So the main effects are tested in a lack of the interactions between the items and in the light of the other terms, in the light of the other main effects as well. So what do you want to use here is that we see over here is that the effects test is useful for our purpose, as it is for model reduction. And, and it allows us to draw inference of the long list of significant variables. We look at the probability at ChiSquare (???) lowest ChiSquare value taking a cut off alpha value of 0.1. We have a number of independent variable so that's quite a long list of them, and most of them, as you look through most of them actually related to the cardiopulmonary illnesses. So some of them are the effected ones like the number of symptoms, the number of days between the vaccination onset. (???) is the more in which the vaccine centers that by each and then you can see that the rest of them are related somehow another to. cardiopulmonary illnesses, there are some strange ones that I don't come from a medical background, so I don't really understand it either, but you can see that deafness is one of them, so there are some strange results that we can see over here, but in general that's, the picture is that, in terms of the top variables in terms listing variables. Okay besides this, right, what is really interesting is look at the model evaluation so even though what we're doing is to build an explanatory model, I went to look into the predictive model as well because JMP has very nicely put report over there for me to look at the parameter estimates so So I use the Profiler to try to understand the parameter estimates and you can see over here that the values shown are really, really small, so this is the value that you get immediately when you open up the Profiler, so the values here are the average of each variable and you can see that each of these variables Of the each of these values here actually very small, so it means that there's very little effect on the severity based on these coefficients. Based on these as a coefficient of the predicted variables. So what you can see that based on this study over here, you can tell that actually (???) symptoms and its effect has very, very little effect on severity and this is kind of like. kind of a within expectation to see that most of these symptoms and effects, because we are looking at the general picture of the vaccine, we can see that most of these symptoms - medical history, allergies - have very little effect actually on the outcome of the of the vaccine. Okay so. yeah. So a little bit of a conclusion, a few statements as a concluding statement. Several decisions were made in the grouping and classification of variables. And although these variables were made to the best of our understanding, especially in the way in which we came up with a severity rating, We perhaps need an expert familiar with vaccines studies or clinical trials to be consulted as to whether or not the severity rating is sufficient to to score the adverse events outcomes. And based on the model building of structured and unstructured data we have identified key factors that varies with the severity in a reaction to a COVID-19 vaccination. However, we're still not the effect of these key variables on the response variable severity is very small, so this is seen by looking at the variables. And then, finally, the document term matrix based on the binary ratings, the binary term frequency was found to be the most effective in representing the weights other terms in the document. And the generalized linear model with the lasso penalized regression technique produced the optimal model. So I hope you enjoyed the very short presentation and do let me know if you have any questions or any feedback, thank you very much. Peter Polito great job. was very. Oh no. I just realized that i'm you know mistake we wanted this life oh gosh. Peter Polito So that this is the exact situation where well you're. So at the actual discovery summit there's going to be a presenter. And so they're going to reach out to you, ahead of time and you just say, I have a mistake on one of my slides, and so in this part comes up it'll just pop he or she will pause it. And then you can share the slide and talk about it and then go right back to the video so don't worry about it at all. This happened quite a bit during last year's discovery summit is not a problem at all. Okay okay. Well gosh oh James. Peter Polito Would you like to fix it and redo it would that make you feel better. I don't know. If it's actually this one here, because this is the wrong box, it will take me a while to actually fix it because I need to retype it over yeah. Peter Polito yeah then it didn't know don't worry about it it'll be a real easy fix and you can do it in real time. Okay okay. Okay yeah. Thanks for sitting, through it, though. Peter Polito yeah no problem is great, I really. Okay. Peter Polito All right, any other questions or comments or anything. yeah i've got one is that um so what's up a link, where I can upload all of my slides and my people and things like that, but there was a mixup with my. email, and I think from tanya about that that when tanya replied me right, I think she missed out on that link, so I thought that the link will be embedded inside one of the recording but I don't suppose you guys got it right. Peter Polito I don't have it, but I will reach out to tanya and asked her to reach out to you directly. To help remedy that. Okay sure thanks very much I think that makes up with my email yeah. Peter Polito Thanks very much. No problem alright well have a have a good night and good rest. yeah you have a good day. Peter Polito Thank you. So much bye bye.  
Laura Castro-Schilo, JMP Sr. Research Statistician Developer, SAS   The structural equation modeling (SEM) framework enables analysts to study associations between observed and unobserved (i.e., latent) variables. Many applications of SEM use cross-sectional data. However, this framework provides great flexibility for modeling longitudinal data too. In this presentation, we describe latent growth curve modeling (LGCM) as a flexible tool for characterizing trajectories of growth over time. After a brief review of basic SEM concepts, we show how means are incorporated into the analysis to test theories of growth. We illustrate LGCM by fitting models to data on individuals' reports of Anxiety and Health Complains during the beginning of the COVID-19 pandemic. The analyses show that Resilience predicts unique patterns of change in the trajectories of Anxiety and Health Complaints.     Auto-generated transcript...   Speaker Transcript Lauren Vaughan See.   Okay, we are recording okay all right, if you could please confirm your name company and abstract title. Laura Castro-Schilo, JMP Laura Castro shiloh JMP SAS and my abstract okay so it's a modeling trajectories with structural equation modeling and that's it right. Lauren Vaughan yep and um let's see you do understand this is being recorded for use in the JMP discovery Summit Conference, and you will, it will be available publicly in the JMP user community do you get permission for us to use this recording. Laura Castro-Schilo, JMP Yes. Lauren Vaughan Excellent OK, I will turn it over to you Laura. Laura Castro-Schilo, JMP Right, and so I just.   have to share my screen somewhere here right.   Now.   But. Lauren Vaughan perfect. Laura Castro-Schilo, JMP two seconds.   Hi, everyone. I'm Laura Castro-Schilo and today we're talking about modeling trajectories with structural equation models.   And we're going to start this presentation by first answering the question of why we would use SEM for longitudianal data analysis.   And then we'll jump into a very brief elevator version of an introduction to SEM. If this is your first exposure to SEM, I strongly encourage you to look for some of our previous presentations in Discovery Summits   that are recorded and available for you to watch, so that you can get a better understanding of the foundations of SEM.   But hopefully even without that intro, hopefully, this brief version will set you up to understand the material that we're going to talk about here today.   And that introduction we're going to focus on how we model means in structural equation models. We're going to see that means allow us to extend traditional SEM into a longitudinal framework.   And modeling those means will have implications for how our path diagrams look like, and also we'll see how those diagrams map on to the equations   in our models. We're going to focus specifically on latent growth curve models, even though we can fit a number of different longitudinal models in SEM.   And then we'll use a real data example to show how we model trajectories on anxiety and health complaints during the pandemic.   And at the end we're going to wrap it up with a brief summary, and I'll give you some references in case you're interested in pursuing some longitudinal modeling and you want to learn more about this topic.   So Singer and Willet are two professors from Harvard's School of Graduate Education, and I think they said it best   when they claimed in a popular textbook of theirs that SEM's flexibility can dramatically extend your analytic reach. Indeed, this is probably the most important reason why you might want to use SEM for longitudinal data analysis.   Now, specifically when we're talking about flexibility, we're referring to the fact that you can fit a number of different models   in SEM that are longitudinal models and can be quantified in terms of fit and can be compared empirically, so that you can be sure that you're characterizing your longitudinal trajectories in the best possible way.   There's a number of different models that we can figure, you can see them listed there and we can,   you know, things like repeated measures ANOVA, which can make some pretty strong assumptions about the data. SEM allows us to relax some of those assumptions and actually test empirically whether those assumptions are attainable.   SEM is also really flexible when it comes to extending the univariate models into a multivariate context. So if you're interested in looking at how changes in one process influence or are associated with changes in another process, SEM is going to make that very easy and intuitive.   Now we know SEM has a number of nice features, and all of those apply in the longitudinal context as well. Things like the ability to account for measurement error explicitly,   to be able to model unobserved trajectories by using latent variables and also using a cutting edge estimation algorithms for when we have missing data, which actually happens pretty often when we have longitudinal designs.   Another interesting feature is that it allows us to incorporate our knowledge of the process that we're studying. So we'll see that that prior knowledge about what we expect the functional form in our data to be can be mapped onto our models in a very straightforward way.   But there's also reasons why we should not use SEM for longitudinal analysis.   I think, most importantly, the structure of the data is what might limit us the most. So in SEM we're going to be   required to have measurements that are taking up the same time points across all of our samples. So say if, for example, we're looking at anxiety and we have repeated measures...   three repeated measures over time, the structure of the data have to be like what I'm showing you here right now, where, you know, we might have anxiety at one occasioin and that's   represented as one column, one variable in our data tables, and then we have anxiety at a second time point and at a third time point.   So what this means is that everybody's assessment of the first time point has to have taken place at the same time, and that's not always the case. And so there's going to be other techniques that are more appropriate if, in fact, your data are not time structured.   We also have to acknowledge the assumption of multivariate normality. Sometimes we can, you know...   SEM might be a little robust to this assumption, but we still need to be very careful with it.   And it's also in large sample techniques. So that data table I just showed you, you know, we really want to have substantially more rows than we have columns in the data, and this might not always be the case.   So just as a reminder, if you haven't been exposed to SEM, this is also a nice brief intro,   is that in SEM, well, one of the most useful tools are called path diagrams, and these are simply a graphical representations of our statistical models.   And so, if we know how diagrams are drawn, then it'll be much easier for us to use them to specify our models and also to interpret   other structural equation models. So these are the elements that form a path diagram and you can see here that a square or rectangle are used exclusively to denote manifest or observed variables in our diagrams.   And that's in contrast to unobserve variables, which are always represented with circles and ovals. Now arrows in path diagrams are...   they represent parameters in the model. So double-headed arrows are always going to be used for variances or covariances, and one-headed arrows represent regressions or loadings.   In the context of longitudinal data, there's another symbol that is really important, and that is a triangle. The triangle represents a constant, and it's used in the same way that you use a constant in regression analysis,   meaning that if you regress a variable on a constant, you're going to obtain its mean. So we model means and we put some   constraints in the mean structure of our data by having a constant in our models. So let's take a look at a simple regression example.   If you wanted to, you know, fit a simple regression in SEM, this would be the path diagram that we would draw. So you can see X and Y are observed variables, we have X predicting Y, with that one-headed arrow, and both X and Y have   variances. In the case of Y, because it's an outcome, that's a residual variance.   And we also have to add the regression of Y on the constant if you want to make sure that we get an estimate for the intercept of that regression. So here, this arrow would represent the intercept of Y, and notice that we also have to regress X on that   constant in order to acknowledge the fact that X has a mean.   And now we can use some labels, so that we can be very explicit about which parameters these arrows represent. And then we can see how those   arrows...so we can trace the arrows in the path diagram in order to understand the equations that are implied by that diagram.   So let's focus first on Y. You can see that we can trace all of the arrows that are pointing to Y in order to obtain that simple, you know, regression equation. We have Y is equal to tau, one...times one (which is just that constant so we don't have to write the one down here)   plus beta one times X (which we have right down here) plus the residual variants of Y.   Now we also can do the same for X, because in SEM all of the variables that are in our models need to have some sort of equation associated with them. And here we want to make sure that we acknowledge the fact that X has a mean, so we regress that on the constant and it also has a variance.   So again, those path diagrams are away to depict the system of equations in our models.   And those diagrams, it's very important to understand that they also have important implications for the structure that they impose on the various covariance matrix of the data and on the mean vector.   And I think it's easiest to explain that concept by actually changing the model that we're specifying here. Rather than having a regression model,   what I'm going to do is, I'm going to fix all of those edges to zero. So all of these effects, I'm just going to say I'm going to fix them to zero,   which is the same as just erasing the errors from the diagram all together, and now you can see how the equations for X and Y have changed there. This is a very...   well, this is a very interesting model. It's simple, but it actually has a lot of constraints, right, because it implies that X and Y have a variance but that their covariance is exactly zero; there's nothing linking these two nodes.   And it also implies that the means for both X and Y are exactly zero, because we're not regressing either of them on the constant in order to acknowledge that they have a non zero mean.   So now we if we really want to fit this model to some sample data, then that means we have some samples statistics from our data.   And the way that estimation works in SEM is we're going to try to get estimates for our parameters in a way that match the sample statistics as closely as possible, but still   retaining the constraints that the modal imposes on the data. And so, in this particular example, if we actually estimate this model, we would see that we are able to capture the variances of X and Y   perfectly, but the constraints that say that the covariance is zero and that the means are zero, those will still remain. And so the way in which we   figure out whether our models fit well to the data is in fact by comparing this model implied covariance and means structure to the actual sample statistics, and so we can look at the difference between those and obtain our residuals.   And these residuals can be further quantified in order to obtain metrics that allow us to figure out whether our models fit well or not.   Okay, so that's our intro for SEM, and these are going to be the concepts that we're going to be using throughout the presentation, in order to understand how we model trajectories with SEM.   Now what better way to start talking about trajectories then to imagine some data that actually have some trajectories. And so I want you to think for a second,   how anxious are you about the pandemic? If it had been asked of you early in 2020, when the pandemic was first started.   And perhaps a group of researchers approached, you they asked this question, and then they came back a month later and asked you the question same question again.   And maybe they came back a couple months later, and also asked about your anxiety. So we might obtain this data from a sample of individuals and the data would be structured in the way that is presented here, where each of those time points would be different variables in in the data.   And now let's imagine that we have the interest of looking at some of the trajectories from that sample, and we want to plot them so that we can start thinking about how we would describe these trajectories.   So let's take three individuals. This is going to be a fabricated example just to illustrate some concepts, but imagine that the first individual gives us the exact same score of three at   each of the time points that we asked this question. And maybe in this example, maybe anxiety ranges from zero to five, where five means there's...you're more anxious about the pandemic. So for this individual, the trajectory of this person is perfectly flat, right. It's a very   simple trajectory, and maybe for a person...individuals two and three, you know, maybe we get the exact same pattern of responses.   And so, if this were to be real, and we had to describe these trajectories to an audience, it would actually be really easy   to do that, right, because we could just say, there's zero variability in the trajectories of individuals, and really, just describing a flat line would would do the rest, right. So we can use the equation of the line to say, you know, anxiety at each time point takes on   these values. And we would have to clarify, right, that the mean, or rather the the intercept for this line is equal to three and the slope is zero, so that we really just described that flat line.   Well that'd be a really easy to do, but of course this is a very unrealistic pattern of of data, so we're not expecting that we would observe this in the real world.   So let's imagine a different set of trajectories where there's actually some some variability on how people are changing.   And in this case, we could still find an average trajectory, right, a line of best fit through these data. And if we only use that the equation of that line to describe the data,   we would really be missing the full picture, right. That would not do a very good job of showing that some individuals, you know, number one is increasing, whereas   individual three is is decreasing. So instead we have to add a little more complexity to that equation we saw earlier, in order to account for the variability in the intercept and the slope.   So again, if we had to describe this to an audience, one thing we can do is in this equation, I'm adding a sub index I   to represent the fact that anxiety for each individual at each time point can take on a different value.   Now notice that the intercept and the slope for the equation also have that I,   indicating that we can have differences...you can have variability on the intercept and the slope, and we can still use the average trajectory to describe the average line, right, such that that intercept   can still be three and the slope is zero. But notice that we add these additional factors here that capture the variability   of the intercept and the slope, and, specifically, these are the values for each individual that are expressed as deviations from the average trajectory.   And then we'll see that we're going to have to make some assumptions about those factors in terms of their distribution, which should be normal with a   mean of zero and an unrestricted covariance matrix.   But even these trajectories are also quite unrealistic, right, because I'm showing you these perfectly straight lines. And when we get real data, it's never ever going to look that perfect. Indeed,   these three trajectories are much more likely to look like this, right, where even if we are assuming that there is an underlying sort of an unobserved linear trajectory,   those are not the trajectories we observed. In other words, we have to acknowledge that any data that you observed at any given time point is going to have some error, right. And so   we're still able to capture that error into our equation and we'll make some assumptions about that error being normally distributed.   But again, the idea is that we have these unobserved error free trajectories and that's not what we really get when we are observing the individual assessments, right, into in our data.   So our equation is going to describe that average trajectory and it's also going to describe the individual trajectories as departures from the individual line...I'm sorry, from the average line.   Alright, so not everything that we have described so far is actually known as a linear latent growth curve model in SEM.   And if this looks like a mixed effects or random coefficients model, if you're familiar with those, it's because it is actually very, very similar.   Now we only have three time points here, so this is a very simple linear growth curve, but we can still have,   you know, more complex models that incorporate some nonlinearities if, in fact, we have more time points so that we're able to capture those nonlinearities, and we can do that for polynomials and there's other ways actually to capture nonlinearities in growth curve models.   Today we're going to keep it very simple and we're going to stick to the linear models, though.   All right now, I want to bring it all together by really showing you how those equations of that linear latent growth curve model, how they can be mapped to a path diagram that can be used to fit our structural equation models. And so we're first going to start by   by using the simplest equations here, the equation for the intercept and the slope. And remember that that intercept and slope represent unobserved values, right, represent   unobserved growth factors, and so we're going to use latent variables, these ovals, to represent them in our path diagram. And notice that the intercept is equal to a mean, plus that   variance factor, right. And so that is why we regress the intercept on the constant in order to obtain it's mean, and we also have this double-headed arrow in order to represent that variability in the intercept.   And we do the same for the slope. Now notice, we also have a double-headed arrow linking the intercept to the slope, and that is to represent the covariance, right, that we make that assumption over here.   And it just means that we're going to acknowledge that individuals that perhaps start higher on a given process might have an association to how they change over time, okay, and that is what this covarience allows us to estimate(?).   Now, ultimately, what we're modeling is our observed data, right, our observed measurements for anxiety, and so here is the full path diagram   that would characterize the linear growth curve. And notice, I'm going to focus on one anxiety time point first, that first time point,   and again using the idea of tracing the path diagram, we can see how anxiety at time one is equal to one times the intercept, which is right here in this equation,   plus zero times the slope, so this part just falls out, plus that error term. So in other words, what we're saying is that anxiety at time one is simply going to be the intercept of that individual plus some error.   And then we can do the same, tracing the path diagram, to see what's the equation for anxiety at the second time point. You can see that it's, once again, that intercept plus one times the slope, so here is basically the   initial value of that person, the intercept, plus some amount of change.   And then, at the third occasion is basically again the...just by tracing this, we see that the equation implies that we have a starting point, which is the intercept plus now is two times the slope.   Right, so notice how in these latent variables, the factor loadings are fixed to known values, and we are fixing those values to something that forces these trajectories to take a linear shape.   So here the factor loadings of that slope is basically the way in which time is coded in the data, and this is the reason why everybody in SEM actually needs to have the same   time point for a measurement, right, because everyone that has the value of anxiety at time one is going to have that   same time code, which is embedded into the way in which we fix these factor loadings.   Alright, so now, in this particular specification can actually work, you know, perfectly fine if we have, for example, yearly assessments of anxiety.   But notice here what I'm emphasizing is that there's equal spacing between the time points, right, and that's important because, in order for this to really be a linear growth curve, there needs to be equal spacing here.   But obviously this could be weekly assessments or they could be assessments that are taken every month and that's fine. This is going to work out great.   Now, it could be that you don't have equal spacing and that can also be handled fine in SEM as long as everybody has the assessment at the same time point.   So here's an example where there's one month spacing between the first measure of anxiety and the second one, but then from the second to the third, there were   two months, and so what we have to do is fix the loading of that last...the slope loading here, instead of two, it has to be now fixed to three, right, in order to capture...and notice from one, we jump from one to three and that's what assures us that we still have a linear trajectory here.   Alright.   So it's time for the demo, and what I want to share with you is some data that come from the COVID-19 Psychological Research Consortium. It's a group of universities that got together and wanted to really start collecting longitudinal data to understand the extent of   the damage really that the pandemic is having on people's mental health and even their physical health. And so we have three waves of data.   And these are from a subsample of the UK, and just like I showed you in that previous slide, the repeated measures are in fact from   March 2020, and then a month later in April, and then two months later in June. And we're going to be looking at repeated measures for anxiety.   The survey for anxiety could vary from...could range the scores from zero to 100, where 100 means higher anxiety.   And then we're also going to look at health complaints over time. Those could range from zero to 28, whereas, you know, higher score for percent more health complaints.   And we're going to look at one time invariant variable which is resilience and this one was assessed at the beginning in March 2020.   Okay, so let's take a look at the data.   So I have the data right here. And notice, we have a unique identifier for each of our individuals, so each row represents a person. Actually,   there's some missing data there that we're not going to worry about right now. But   notice we have some demographic variables and then further to the right here, we have our data on anxiety and those are the repeated measures that we're going to focus on first.   Now I do want to say that initially, you would want to, you know, plot your data with some nice longitudinal graphs,   but we're going to skip straight into the modeling because I want to make sure we have time to show you how to use the SEM platform for these models.   So I'm going to go to analyze, multivariate methods, structural equation models. And I'm going to use those three anxiety variables and I'm going to click on model variables and okay, in order to launch the platform.   So notice that, as a default, we already see a path diagram that is drawn here on the canvas and we can make changes to that diagram in a number of ways.   I usually use the the left list, the from and to list, where we can select the nodes in the diagram and we can link them with one-headed arrows or two-headed arrows, right. I can just show you here, so by selecting them, we can make some changes here.   And I can click reset here on the action buttons, in order to get us back to that initial...initial model, and we can also add latent variables by selecting our observed variables in this tool list and then also adding latent variable here with that plus button.   So nice thing for us today...and I'm sorry about my dog is barking in the background, but we probably have some mail being delivered.   But the nice thing today for us is that we have this really useful model shortcut menu. And if we click on here, we're going to see that there's a longitudinal analysis menu with a lot of different options for growth curves.   So let's start with the intercept only latent growth curve. And here the model that's being specified for us is one where each of our anxiety measures is only specified to load onto an intercept factor.   And so this is one of those models where there's only a flat line, but we have a variance on the intercept acknowledging that individuals have flat lines, but they could have different intercepts for them.   Now we don't know if this model is going to fit the data well. In many instances, it won't because it's a no growth model, and   nevertheless, it's actually quite useful to fit this model as a baseline so that we can compare our other models against this one, right. And we do   label the model no growth as a default here when you use that shortcut. So I'm going to click run and very quickly, we can see the the output here.   There's two fit indices are really important for SEM. These are over here. The CFI is something that we want to have as close as possible to one, and you can see here, this is...   this is pretty low. Usually you want to have .9 or higher,   at the least. And RMSEA, we want it to be a most .1. We really wanted to be as close to zero as possible.   And so, this is very high, and so, not surprisingly, it's a poor fitting model, so we're not even going to look at the estimates from it, because we know it doesn't fit very well.   But we're going to leave it there because it's a good baseline to have in order to compare against. So going back to the model shortcuts, we could look at the linear growth curve model.   And when I click that, I automatically get that slope factor added and notice that   the factor loadings are there, and as a default, we just fix them to zero, one and two. Now the way in which this   shortcut works, is that it assumes that your repeated measures are in the platform in ascending order. It's really important, because if they're not, then these factor loadings are not going to be   specified like...they're not going to be fixed to the proper values.   In fact, here you can see that June is fixed to two, but I know that there's two months in between April and June and so I'm actually going to have to come in here and make the change by selecting this   loading and clicking on fix to, and I'm going to fix it to three, because I know that that's what I need to have to really have that linear growth curve.   And so that's it. We're ready to fit the model and so I'm going to click run.   And notice what a great improvement in the fit indices we have, right. The CFI is nearly perfect and the RMSEA is definitely less than .1, so this is a very good fitting model and we can now   look at the parameter estimates to try and understand what are the trajectories of anxiety.   The first bar we can see is the means of the intercept and the slope.   They are statistically significant and they tell us the overall trajectory in the data, so on average individuals in March started with an intercept of 60, about 67 units,   and over time on average, they're decreasing by about five and a half units every month. Because of the way that the slope factor loadings are coded, we know that this estimate represents the amount of change from one month to the next.   Some of the very interesting estimates in this model are the variability of the intercept and the slope.   And notice they're also substantial in this model, which basically means that, yeah, we have that average trajectory, but not everybody follows that trajectory.   That means that some individuals can be increasing, while others are decreasing and others might be staying flat. And so a natural question at this point can be,   you know, what are the factors that help us distinguish between those different patterns of change? And that is a question that can be really   easy to tackle in this framework and we're going to do that by bringing in factors that predict intercept and slope.   So on the red triangle menu, I can click on add manifest variables, and let's take a look at resilience as a predictor.   So I'm going to click OK, and by default, resilience has a variance and a mean and that's okay, because I want to acknowledge has a non zero mean and variance,.   but I want it to be a predictor, so I'm going to select in the from list, and I'm going to select intercept and slope in the to list.   And we're going to add a one-headed arrow to link them together and have the regression estimates, so we can understand whether resilience explains differences in how people are changing.   And so I'm just going to click run here, and we see that this is, in fact, a very good fitting model.   And it has some really interesting results, because it shows that the estimate of   resilience predicting the intercept, that initial value of anxiety is, in fact, statistically significant and negative. And it can be interpreted as any   standardized regression coefficient, meaning that, for every unit increasing resilience, this is how much we should expect the intercept in anxiety to change, right. So the more resilient you were in March,   the more likely you are to have lower score for your intercept in anxiety in March, so that's really interesting, but then again resilience in this model does not seem to have an effect on how you're changing over time.   Okay, well, that's really interesting, but I really want to get to the idea of fitting multivariate models in SEM, so let's go back to the data.   And I've already specified ahead of time...I saved a script that models again, just a linear univariate model of health complaints over time.   So we have an intercept and we have a slope and I fit this model, you can see it fits very well as well, and so we can look individually at both   anxiety and health complaints over time. And that is often times a good way to start to look at the univariate models first.   And so here health complaints, as a reminder, could range from zero to 28, and we can see that the trajectory according to the means here, average trajectory   is described by an overall intercept of about four and it has increases over time of about .3 units.   And in this case, there seems to be significant variability in the intercept and not so much...not not for the slope, so people are generally changing in the same way. Overall, individuals seem to be increasing by .3 units every month in their health complaints.   Okay, so now let's use this red triangle menu, and once again we're going to click add manifest variables, but what we're going to add are all three repeated measures for anxiety.   So I'm going to click OK, and as a default, we're going to put the means and variances of anxiety, but I don't want the means of anxiety to be freely estimated.   What I really want is for the means to be structured through the intercepts and slope factors. So I have to select those edges, and I'm going to remove them so that   instead, what I'm going to start building interactively here is a linear growth curve that looks just like this one, but for anxiety.   So I'm going to start by selecting all the three measures here, and I'm going to name this latent variable intercept of anxiety. I'm going to click plus.   And now there's the intercept factor but notice as a default, we will fix the first loading to one for any latent variable.   But because we want this to take on the meaning of an intercept, we actually want to fix these two loadings to one. I'm going to click here,   fix those to one, and now we have to add the slope. So I select all three of them, and I'm going to say slope of anxiety. I'm going to click plus.   Now that slope is over here. Again as a default, we fix this first loading to one, but I know that I want to code this in a way that that first   factor loading is zero, so I'm simply going to select that factor loading and I'm going to click delete to get rid of it, because that's the same as fixing it to zero, and then I'm going to fix this loading to one.   And that last loading needs to be three, in order to have that linear growth.   Now we're almost done. Remember that the most interesting question that we'll be able to answer in this bivariate model is   to look at the association of growth factors across processes. So we're going to select all of these nodes in the from and to list and we're going to link them with double-headed arrows. Those are going to represent the   covariances across all of these factors, and the last thing we need is to add   the means of intercept and slope for anxiety. So we're going to click over here, and that's it. We're ready to fit our bivariate model. I'm going to click run.   And notice it runs very quickly. The model fits really, really well, and these mean estimates, once again, describe the trajectories for each of the two processes. I'm going to hide them, for now, so that we can interpret some of the other estimates with a little more ease.   I think there's some really interesting findings here. You can see these values are in a covariance matrix, so   we could actually change this to show the standardized estimates, just so that we can interpret these covariances in a relation metric.   But what's really interesting is to see that there are positive significant associations between   the intercept, that is, the the baseline starting values of individuals in their health complaints and how they're changing in their anxiety over time.   In other words, the higher your intercept is, your initial value of health complaints, the more likely you are to have higher rates of change and anxiety. And we also see that positive association between the baseline values in health complaints and anxiety.   And there's another positive association here that's really interesting, because this is a positive association between rates of change.   So the more you're changing in health complaints, the more likely you are to be changing in your anxiety. So if you're increasing in one, you're increasing on the other, so that's really insightful. What again...   we can still come back and add a little more complexity by trying to understand the different patterns of change in this model, so we can go to add manifest variables and look at how resilience   impacts all of those growth factors. So I simply add it as a predictor here very quickly. The models do start to get a little cluttered, so we're going to have to move things around to make them look a little better, but this is ready to run.   It runs very quickly. It fits really well and we could, you know, we could hide some of these edges, like we can hide the means and   even the covariances for now, just so that it's easier to interpret these regression.   effects. And so you can see that resilience has a negative association with both   health complaints and anxiety at the first occasion. In other words, the more resilient you are in March, the more likely you are to have lower values in the health complaints and in anxiety, so that's really cool.   And we also see here that for the rates of change, in the case of anxiety, the rate of change is not significant, the prediction isn't,   but it is significant -- this line really should be a solid, because you can see that there is a significant association...negative association between resilience and the rate of change in health complaints, such that the more resilient you are, the   more likely you are to be decreasing in health complaints over time. That's really interesting, especially when you tie a   well-being or mental health aspect, like resilience, into something more physical, right, like that health complaints.   Alright, so we're running out of time, but the very last thing I want to show you here, just because I really want to show you the extent to which SEM is so flexible and can answer all sorts of interesting questions.   I actually fit a model that is a bit more complex, where I'm looking at three different predictors of all of those growth factors.   And I also brought in measures of loneliness and depression in June at the last occasion. And what I did here, again I left this with all the edges, just so that you could really see the full specification of the model.   But I can hide some of the edges, just to make it easier to understand what's happening here. What I did is I added   loneliness and depression, and I'm trying to understand how the patterns of growth are predicting those outcomes, alright. So here you see those regressions.   And we're also adding some interesting predictors like the individual's age, the number of children in the household, in addition to to resilience, as we saw before.   And I could spend a long time just really unpacking all of the interesting results that are here.   Without a doubt, you see, solid lines represent significant effects, so you can see that your patterns of growth and health complaints significantly predict depression   at that last month in June. So that's, to me... I find that fascinating and you can also see how resilience in this case has a number of different significant...   number of different significant effects on how people are changing over time. Here is is an interesting effect,   where for every unit increase in resilience, we expect the rate of change in health complaints to decrease by .02 units, so it's a small effect but it's still a significant effect, so it's really interesting.   And there's a number of things that you could explore just by looking at the output options.   At the very bottom here, I included the R squares for all of our outcomes and you can see we're not explaining that much variance in the intercepts and slopeo factors here, so that means that there's still a lot more that we can learn by bringing additional predictors to this model.   Okay, so let's go back to our slides, and   I want to make sure that we summarize all the great things that we can achieve with these models.   You can see that growth curve models allow us to understand the overall trajectory and individual trajectories of change over time.   They allow us to identify key predictors that distinguish between different patterns of change in the data   and allow to examine effects that those growth factors have on outcomes. And when it comes to multivariate models, it's really nice to see how how change processes...   changes in a process can be associated to changes in a different process.   Now it's important that we remember in our illustration that the data were observational, so we cannot make causal inferences, and also, we were using manifest variables for anxiety, but anxiety is an unobservable   construct, so really just be aware that if you really wanted to, we, and if we had experimental data, we could use experimental data so that we could make causal inferences and we could have also specified latent variables for anxiety,   such that we had more precision on our anxiety scores.   Alright, so I think, even though we cannot make causal inferences, it's pretty fair to say that resilience appears to be a key ingredient for well-being, and so I want to make sure that this is the take home message   today, because I think as the months continue to pass during this pandemic, we all need to find ways in which we can foster our resilience, so that we can, you know, deal with whatever comes as   well as we can. And so with that, I want to make sure that you have some references in case you want to learn more about longitudinal modeling and I thank you for your time.
Michael Akerman, Senior Data Scientist, Novozymes Johan Pelck Olsen, Data Scientist, Novozymes   The difficulty introduced by a democratizing software like JMP is handling the vast number of users creating and sharing tools and scripts across a large, global corporation. At Novozymes, we've built an automatic toolset called Y'all to help address that need. Y'all is a Python build script, automated workflow, and JMP add-in package that makes building, sharing, and updating JMP add-ins as easy as uploading a photo to Facebook.   Utilizing the continuous integration/continuous deployment capabilities of GitLab (an open-source competitor to GitHub), Y'all allows users to upload raw JSL scripts saved from any platform in JMP through a web user interface, immediately launching an automated Python build script that structures and packages a JMP add-in on the GitLab server. The next time JMP loads, our tool automatically checks for updates (based on work shared by Jeff Perkinson in the JMP community), sending new or updated scripts to users within seconds.   In addition to these capabilities, Y'all allows users to specify the menu name and tooltip of their launchable script, automatically loads a variety of convenient utility functions for JSL coding, automatically adds imported functions to the Scripting Index, and leverages GitLab to allow parallel development of scripts while retaining a full version history of each JSL script and the add-in package, ensuring that valuable work is archived and protected in a centralized system.     Auto-generated transcript...   Speaker Transcript Michael Akerman Christie. Christy Spain Good morning, how are you Michael. Michael Akerman How are you. Christy Spain Great Let me close out my email, so I don't get me. Michael Akerman yeah I'm a streamlining my computer settings so that nothing, nothing pings. Christy Spain know what i've hit my toolbar Am I.   Think I have everything.   how's your weekend. Michael Akerman It was good, how are you how's your weekend.   Good. Christy Spain So do you live in dark. Michael Akerman Helen wait for us.   Okay yeah so pretty close to headquarters I would be going to the actual discovery summit, it was.   present in person at the.   That was disappointing as well, but.   yeah yo on hey. Christy Spain Hello Good morning, how are you doing. Johan doing good. Michael Akerman I need to set up a an erudite wall behind me like you've got there Johan. Johan I did did put in click.   This and danny's books in there, so you won't be able to recognize. Christy Spain looks good.   I apologize for the mix up last week but.   I'm happy to be able to meet you both virtually.   Since I'm part of the premier team who supports Nova zones, I don't they should have assigned me to you guys anyway so.   let's see, I have a few things I need to go through do both of you have your display set to 19 2010 at.   I do. Johan Let me just double check. Christy Spain awesome.   and   If your sound is good, I would say nobody's wearing logos.   ensure no items are within view that could cause copyright infringement I don't know that your books back there would be a problem.   yeah let's do that, just in case. Michael Akerman One of them is one of them is formed from a former JMP discovery summit keynote speaker. Johan Who is. Michael Akerman Oh, I didn't he came and talked about that thing explainer. Christy Spain Oh yeah I have that book. Michael Akerman yeah yeah. Johan I haven't.   saved any of the ones from the email, but I can just go ahead and pull that out of the email. Michael Akerman I'm with having i've got my webcam on a an arm, where I can move it like toward me and back and up and down and left and right like then it's like fiddling with it eternally because it's like I can move it. Christy Spain Oh that's a good idea.   never, never saw me in the right place yeah.   And then everybody's hitting all their Apps outlook any I am teams.   like that. Michael Akerman Close teams shouldn't get any notifications. Christy Spain And you have 30 minutes Is that correct. Michael Akerman that's my understanding yep. Christy Spain And will you guys be.   Ideally we'd like to just start and go straight through.   Do you want to do a test share of your screen so whenever you transition. Michael Akerman Your because we're going to swap between my screen and you're on screen at one point, so let me share my screen first.   don't need to share sound.   So. Michael Akerman let's back up to begin here. Christy Spain hi Alex right.   Okay. Michael Akerman So you're on we were did it did it did it did it did that that and then we were handed off to you on. Johan yeah will we be controlling screen sharing ourselves, we can use producer dude for us. Christy Spain you'll you should do that yes.   Oh, your background looks great. Johan Thanks. Christy Spain I should have that. Michael Akerman That I can be more we can be more uniform I guess. Johan No, we should have our own individual I didn't.   Yes. Christy Spain Yes, let's get to Michael. Johan yeah.   I guess I won't be much onscreen link you'll be doing, most of the talking and then we'll be doing doing the screen sharing and then. Michael Akerman I'm not sure what will we be visible, while the screens are shared. Christy Spain Well, so whenever I get the recordings back.   I get four different.   versions there's one that includes you when it doesn't and I'm not sure Jeff perkinson will be editing everything, so I don't know what hill which one he'll pick.   I would think that he want to show the speaker so. Michael Akerman I see on screen, I think.   You can just turn off the icons, by the way, if you right click. Johan On manage to confuse window, it looks like. Christy Spain Well, is that really that hard to do. Johan Oh yeah.   So what right click and view and. Michael Akerman yeah he found it okay cool.   cuz I don't remember exactly where a.   Guy I never have my icons turn on my desktop I just let stuff installed to it, so if I ever turn that on is just like. Johan yeah. Christy Spain So, Michael when you're finished you'll just say okay turn it over to you on the on your screen share and then we'll just go from there we're not really gonna we're not going to take a break, in other words yeah. Michael Akerman And you get a yellow filter on my room. Christy Spain Oh, you have a red light your fancy. Michael Akerman makes me to blue though yeah. Christy Spain haven't invested in one of those I don't my desk is so small, and I have a lot of space in here. Michael Akerman yeah I ended up early independent my my boss kept complaining about my microphone quality and my camera quality so I'm like all right man you're buying me a new microphone camera. Christy Spain good for you, I didn't get this new headset. Michael Akerman Let me go close my door real quick so. Christy Spain Oh okay yeah that's important if you have you know furry friends or kids running around. Johan yeah good neither so well yeah i'll set the closed door. Christy Spain and Johan Where are you located. Johan On the company. Christy Spain than mine oh Nice.   be fully. Johan Oh, you been. Christy Spain I have a friend who married to.   A guy from Copenhagen are from Denmark, and so we went over there to visit.   Oh yes, beautiful. Christy Spain Okay, so, if I can get you guys to state your name company and your abstract title that would be great.   And this is not for the recording This is just for Jeff purposes for like identifying what's your. Michael Akerman name is sure sure, my name is Michael increment I'm with novozymes the abstract title is as easy as uploading a photo.   something about packaging and bill I forgot the exact. Christy Spain that's okay. Michael Akerman Y'All. Johan And my name is a UN.   Also, no sense. Christy Spain And we will be recording this and posting it to the discovery Summit Conference, and it will be available for public viewing do you get permission for this recording in that use. Michael Akerman We do I do.   When you go to. Johan Oh yeah sorry sorry I was just my mind. Christy Spain Okay, so.   Let me turn off my camera before for you.   Nobody wants to see me. Christy Spain And I think we're ready, are you guys ready. Michael Akerman I guess we'll start with my screen. Christy Spain Never you're ready let's see the time is.   909 so technically you have until 939 but.   give or take a fuse fine so whenever you're ready. Michael Akerman Hello, everyone, and welcome to our talk today on "Y'All." This is an add-in package manager that we've built in Novozymes to automate JMP add-in sharing and updating and maintenance of scripts across our global company. I'm Michael Akerman, and my co presenter is Johan Olsen.   Today, I want to, we want to talk to you about Y'All and introduce this toolkit to you too, so that any of you who are interested in it might...   ...Who might want to use it for personal uses or for your organization know how to get this toolkit and to start using it, and what it is. So we'll start by talking about who we are and why we built Y'All...   ...the key features in Y'All as well as the user experience. So Johan's going to go through what a user sees when they're using this toolkit.   We'll go behind the scenes about what's happening behind the Y'All toolkit when a user interacts with it and then we'll do a summary and recap, and have some tips on adopting Y'All in your organization, if you find it interesting for your purposes.   We are data scientists at a company called Novozymes. My co-presenter is Johan Olsen. He's a data scientist in agricultural and industrial biosolutions application research.   I'm a senior data scientists in the same group in Novozymes. We've included our contact information here if you'd like to reach out about Y'All or anything else. Please feel free to reach out to us.   Johan's email address is jpko@novozymes.com and he is JPKO on the JMP User Community. I am mxak@novozymes.com.   Michael Akerman Michael mxak on the JMP User Community, so please feel free to reach out if you have any questions or follow up or comments in any way about Y'All or ways to improve what we've built and what we're working with.   We work for Novozymes. Novozymes is a global biotechnology company. We build biological solutions for sustainability...   ... and improve performance and reduce costs for all kinds of industries around the world. So we support agriculture and biofuels, household care, food and beverages, animal health and lots more industries...   ...with these biological solutions. We've got more than 6000 employees globally and in that set of employees, we've got hundreds of JMP users.   And I mention that because we've got a really large cohort of JMP users that we support and that we work with who are all working with JMP and building things in JMP, and that's really the inspiration for Y'All.   Our purpose as a company is "Together we find biological answers for better lives in a growing world -- let's rethink tomorrow."   As I mentioned, we've got a very large cohort cohort of JMP users in Novozymes.   So we wanted to make their lives easier to work with JMP scripts and JMP add-ins and make their code more reusable and discoverable.   So first off, we wanted to make updating easy. We had observed that across Novozymes people were emailing scripts back and forth.   There were sending JSL files on personal network drives. Some of them were putting it on flash drives and handing them around. We wanted to make distributing new and updated JMP scripts as effortless as possible.   We wanted to make an archive that safeguards the code. We've got a lot of JMP users, all of this work, represents a lot of value to the organization.   So a central repository where all that valuable code is stored and backed up and archived.   And we wanted to track versions over time, so as scripts change, as they're improved, we wanted to be able to see those changes and track those changes over time.   And we wanted to make it easy to discover and reuse scripts so if you're a JMP developer...   ...if you're a developer work writing JSL scripts, you can find relevant scripts written by other developers and improve on them or...   build off of them, for whatever needs you have. So make it convenient to take those scripts and build off of them, also make it convenient to share and import and reuse functions in JSL that other members of the organization would find useful.   So, to make this a reality, we put together three components. We've got a Python build script, we've got a GitLab interface, and we've got a JSL load and update script all working together in this toolkit.   The Python build script is an automated script that will run in Python 3 or above that automatically parses and structures JSL files into add-in menu structures.   It also writes the add-in definition files. It generates an imported custom functions script that adds all of these reusable utility functions to the JMP scripting index and into the JMP IDE, and it zips it all into a .jmpaddin file for use by users.   GitLab is our interface for users to interact with add-in scripts and working with the Add-in Builder itself. GitLab is the open source alternative or competitor to GitHub.   We have this posted internally within Novozymes, and it provides a really easy point-and-click user interface for uploading scripts into this central repository...   ...where all of these scripts are stored and the build actually happens.   And that CICD pipeline, which stands for continuous integration continuous deployment, automatically executes that Python build script and distributes the add-in through both release and beta version channels so users can have either...   ...steady quality-controlled releases, or if they're a developer, they can get frequent quick releases with the very latest versions of each script used in this toolkit.   And finally, we've got the JSL load and update scripts. When JMP opens up and when this add-in loads,...   ...they can check, it will check for new versions on the selected channel -- the beta or release channels -- and offer to download and update that add-in package. This also imports the custom functions into the scripting index.   We've got super easy script contribution in Y'All from any JMP user. It's as easy as uploading a photo on social media.   You go into the user interface, click the upload button, and in seconds it's available on the beta channel of Y'All.   We've got automatic version control and storage of these scripts, an automatic or manual JMP add-in update, so when JMP opens, Y'All will offer to update...   ...or a user can choose to manually update Y'All through the Y'All menu.   We have beta version channel for rapid build and testing if you're a developer writing scripts to get the very latest version of the add-in as you update your script and make changes to it. And we've got a release channel for controlled deployment and quality control.   This gives us discoverable and reusable JSL files, so that people can look through this file structure of scripts submitted by users across Novozymes, Y'All users, and look in and see...   ...find scripts that meet a purpose that's close to what they want and modify them for their own purposes.   And we also add convenience utility functions into the scripting index; you can see a little screenshot of that...   ...on the left and into the JSL editor, so the build script automatically adds functions that are importable and that are reusable into the JMP scripting index with a nice little doc string if that's available to describe how this function works and how to use it.   I did want to take a second to acknowledge some of the code we've included in Y'All for demonstration purposes. Much of the code used within Novozymes is specific to Novozymes' systems and wouldn't be usable outside the company.   So I've grabbed some scripts that were on the JMP User Community already and included them just as demonstration purposes. You can feel free to keep these or remove them from Y'All...   ...as desired. So we've got a json parser script from Xan Gregg and Craige Hales. We've got a data cleaning script assistant presented at JMP Discovery Summit 2020...   ...by Jordan Hiller and Mia Stephens. We've got image table tools by Heidi Hardner and Serkay Olmez at Discovery Summit 2020.   Color row selection.jsl by Nick Holmes. Multiple Y axis chart by Byron Wingerd, and autovalidation setup, I believe, also presented at JMP Discovery Summit...   ...last year by Mike Anderson, and we've got a suite of importable functions for the Riffyn data capture software, Riffyn Nexus...   ...to work with their API, written by Riffyn incorporated and taken from their Riffyn tools JMP add-in with permission. That's a tool that we use heavily in Novozymes. We included this because if there are other Riffyn users, you can feel free to use this code as well.   So with that out of the way, we wanted to jump into some live demos, starting with the user experience and what the user sees and does as they work with the Y'All toolkit. For that, I'm going to hand control over to my co-presenter Johan.   Johan, you're muted. Johan Thank you, Michael.   Sorry about that. Let me just share my screen.   Are you seeing this? Michael Yes. Johan All right.   So as Michael said, I will be walking you through the user experience of using Y'All.   It's really a three-tiered thing to be a user of Y'All. The vast majority of our users basically use it as if it was a built-in JMP platform. They...   ...point and click at the scripts and run them for the purposes that they were intended. We have a pretty large user group as well who are contributing with scripts to also be going through how...   You can contribute simple scripts. And then we have a small group of developers and maintainers (??) that do more advanced stuff, and we'll try to pull a little bit of that as well.   But first of all, the real simple usage of Y'All. Installing it is as simple as clicking on this link to the build artifact (??) that's created in our GitLab repository and downloading the JMP edit file and running it with JMP.   Now this prompted me for an update, I'm just going to cancel this time around, but you can see here that on my menu bar, and I have the Y'All menu...   ...With some of these demo scripts that Michael mentioned. I won't be going into details with any of these scripts as they're not really part of the Y'All package but are just included for demonstration purposes. But what I will do is I'll try to go in and...   ...oops, enable, sorry this was already, enabled it appears. enable beta testing, so that, now that we start creating some things we can see them right away.   Most users will have beta testing disabled, and they will be prompted with an update when they open JMP after a major release, which is once every few weeks. But for users with beta testing enabled, it might be anytime from several times a day to a couple times a week when the beta branch will be updated.   So that's basically what's in it for a normal user.   If you want to contribute scripts to this, there are a couple ways of doing it. One thing is through the GitLab UI. Now the menu structure that we just saw is mirrored in the...   ...directory structure in the...   ...repository. So creating a new...   ...menu or sub menu is as easy as just creating a new directory. Here, let's just call this...   ..."demos" johan and add it. Johan And if we would like to add a script, we can simply create it in the JMP IDE so you want to make a very simple.   Hello world.   This will be good enough. There's actually a little extra detail, which is to name the menu items in a nice way to get a tool tip, we implemented the [???] or code magic, that is, if you start by writing commented out, oops.   And you.   Menu name. Michael Akerman Y'All.   Menu name. Johan Y'All menu name michael Akerman Old habit. Johan Let's call it hello. Then Y'all tooltip.   All right, let's go ahead and save this.   To wherever, I guess, let's see, desktop. And then upload it here.   Was it test? Michael Akerman It was script but should be OK. Johan Now...   Once we've uploaded this file, we've triggered a pipeline in this automated [inaudible].   [inaudible] system, which is a cloning the repository first and then...   ...afterwards building the add-in, and this should prompt upon a restart of JMP, johan and update of the all edit. Johan And once we've chosen to update, the demos menu is now present, with the test script.   So this is the simplest way of contributing scripts to JMP. If you are familiar with using Git, you can also use a more classic Git workflow. Just going to open a batch window here. Let me just...   And we can clone the repository. Let's go back to... Some of the things that we're showing here will be specific to using GitLab, which we use here.   But it will be similar in other remote repository coding, code hosting systems.   So now we are cloning the repository, and we will have a complete mirror of the repository in our local system. And similar to before, we can see the directory structure mirrored here, and let's try to see if we can create a...   Another script.   For this. Y'All menu name improved   hello.   Say we'd like to, maybe this time..   ...open a prompt instead.   It says hello world.   Now, this time around, we can use one of the built-in functions that were included with JMP. Let's go to the scripting index, we can...   ...search for Y'All utils.   And find the errorModal function, and it comes here with a little doc string, of how to use it.   And it's already been imported so...   Y'All utils errorModal johan And we can just say hello world. Johan let's see this.   into our repository...   ...under demos. And push the code.   Now the code is pushed to the remote repository, triggering the CICD pipeline, and in a matter of...   ...seconds, should be able to see the pipeline running and...   ...soon enough.   Hasn't started yet.   Maybe I'm getting a little bit...   ...impatient. There we go.   So sure enough, if we go in and check for updates to Y'All, we will see a new update. Is that it?   We will choose to update...   ...the add-in, and now the demos will have our improved hello world.   All right, so I think that was what I had for the user demo. Michael Akerman Thanks, Johan. I'll take back over the screen share then, if that's OK with you. Johan Yes. Michael Akerman And I wanted to go in a bit to what's happening behind the scenes, when this user experience is happening. So I'm going to look at this code in Visual Studio Code.   You can look at this wherever you want once you download the Zip file, and I'm going to go into Zen mode real quick to make it a little bit easier to see. Here we go.   So here's this GitLab CICD pipeline that's running whenever Johan uploaded that script. So he went into the Git repository...   ...and he hit the plus button uploaded a script, and that fired off this CICD pipeline. So this is a small script that defines how the server should react to that situation.   a pre step and a build step...   ...where it cloned the Git repository and all the files in it, including Johan's new script into a working directory where this...   ...this processing module called a runner can work on that code. And then it built it, so it runs this build stage where it...   CDs -- changes directories -- into that build directory and runs that Python build script. So it's as easy as that really. It's cloning all of these new files into a working folder, moving into that folder and running the Python build script in Python 3.   So let's take a look at that Python build script. I'm going to control tab over to that. This is build_yall.py.   So this is a lot of code, but if you're looking at this and want kind of a guide to how to look through it, scroll all the way down to this process function. It says def process.   This is the main code that builds Y'All and step through it, step by step. So we're going to peek into this as we go.   The first thing that happens is the Python script builds a menu hierarchy. So let's peek into that. What it's doing is making a nested Python dictionary where it looks through that folder structure under src/scripts...   ...and pulls all of the JSL files it finds into a dictionary that contains the name and the tooltip and the file location of that JSL file. So it's building that nested dictionary one folder at a time, basically, and crawling through.   Then we check the command names of each of those JSL files. We try to parse those files and look for that Y'All menu name and Y'All tooltip that you saw Johan put in at the top of the script to...   ...define what the button name should be and what the tooltip should be in the add-in menu in JMP. Whenever Y'All is installing, that Y'all menu shows up, what those should be called and what the tooltip should say, if you hover over it.   If it cannot find that Y'All menu name or Y'All tooltip kind of magic command, it will instead just fall back to the name of the script with no tooltip. So test.jsl would just become test as you saw from Johan.   That dictionary of commands in menus is sorted alphabetically to make it easy to find things, except for the help and settings menu, which is popped up to the top to make that always at the top of the menu.   Because help and settings are important. And then all of that information is written into a JMPcust file.   So a .jmpcust file is essentially a kind of markup language document that defines for JMP how to...   what menu buttons should be, where should the menu itself be, what should the tool tips be, are all defined in this .jmpcust file.   I'm going to control tab over to Y'All, so you can see what this looks like. It's literally a bunch of definitions about what the menus are...   ..and what the menus are called and what they contain. So we can see the help and settings menu contains check for Y'All update, switch beta testing mode on off, and so forth.   So the Python script is going to write that automatically with this write jmpcust file function.   And then we're going to write our add-in definition files. So addin.def and version.txt. Addin.def -- I'm going to switch over to an example of that -- is a little...   ...bit of description that JMP uses to identify the add-in and find the version number, as well as the minimum JMP version supported.   We say 12. That's kind of an arbitrary choice on our part, just to not have to worry about things before 12. You can change this in the Python code, if you start working with this yourself. The add-in version is also defined here, as you can see.   And we write this into a file called version.txt so that same integer number you just saw in addin.def ends up in version.txt as well.   The reason for that is that's a nice small file we can read remotely. We can open in JMP over HTTP in the server...   ...and take a look at that version number very quickly in that version.txt file to compare to our current version. So if our currently installed version doesn't match the version.txt...   ...that means we need to update, or we at least can update. The version number we use is the date time that this ran in the UTC timezone.   And we convert that to basically seconds since 1970. So it's an integer number that's convertible back into a date time format.   So we can compare the two numbers on an integer basis, which is nice. We can easily see if they're greater than or less than each other, and we can still convert that back into a date. So it's a really efficient way to store that version number in JMP for us.   Finally.   The next thing we do -- so we've written those add in depth files -- we create a custom functions script. So this is a little strange and a little hard to explain. I'll do my best to do it quickly and succinctly.   JMP has two ways of defining custom functions. One is with a function called function in the JMP Scripting Language, and what that does -- I'll switch over to utils.jsl and the importables sub sub directory of Y'All here.   Function defines a custom function that can be used and called from other locations. And you can define this with a namespace. So in this case, we've got the namespace Y'All utils...   ...with the function name recode if. And it takes these arguments, and we've got this doc string and this function code written here.   The second way to write a custom function in JSL is with the new custom function function, which is not the same as function. I believe this was added and JMP 14.   What new custom function does is it adds the function to the JMP scripting index and into the JSL IDE to make it very easy to work with that function and to reuse it in other in other places with documentation.   So the advantage of function is that it works all the way back to I don't even know what version of JMP...   ...before I was using JMP. The advantage of new custom function is it is a bit more robust, and you get the documentation and the JMP scripting index and the ability to colorize it in the JSL IDE and get...   ...tips on it as you're writing code. So we like new custom function, but we want to support JMP before JMP 14. So the build scripts...   ...here, will essentially parse through the importables directory, looking for any function definitions matching that functions function function function...   ...in JSL with a doc string. So it will try to parse out that doc string between the slash asterisk demarcations and pull that into a new custom function. So you can see an example here.   We take that recode if function and now rewrite it automatically into a new custom function in the Y'All utils namespace called recode if.   And we've got a prototype and a description and a scripting index category where this all shows up. So that's all available to JMP developers automatically. So you can write things using the older function function in JSL and still get this JMP scripting index...   ...ability, by using the Y'All build script automatically. Scroll back down to my process code here.   All of this being written, we now build the add-in. JMP add-ins are using a Zip format that's been modified. So we Zip it together using Python 3, rename it to YALL.jmpaddin and that's now available in our working directory where the CICD runner is working.   So the final step of this CICD pipeline after this build script is complete is to move into that working directory and upload artifacts.   Build artifacts are a GitLab CICD concept, where we take these two files YALL.jmpaddin and version.txt and upload them to the server and make them available with a specific URL.   So that URL is always consistent, and it's tied to the branch that we're running this on in GitLab, so if we're on the beta version versus the main release version.   Now, in JMP -- I'm going to jump over, ha!...   ...to autoUpgrade.jsl. This is where we check for new versions of the add-in. When JMP loads, Y'All is going to run this auto-upgrade script,...   ...which looks for those new versions in the GitLab repository. You can see here we load our beta flag, whether on beta mode or main mode.   And we look in the specific channel for that mode. So we're pulling...   ...from a specific URL based on that channel selection. So if we're in beta mode, it's going to look the beta channel. If we're in main mode, it's going to look at the main channel. And we look for the YALL.jmpaddin file and the version.txt michael Akerman And now we're going to compare that version number to our installed version number, and if it's different, request an update from the user. Pop up a dialog box that you saw in Johan's screen... Michael Akerman ...that asks them if they want the update and tells them what that version number is. And they can choose to or choose not to.   So.   That's all there is to it really. All that happens behind the scenes, for the user. They upload a script through that GitLab UI using that plus button to upload a file and build that new...   ...JMP add-in, that Y'All JMP add-in, and make it available to other users who are connected to the same repository.   All that happens behind the scenes. It's as easy for the user as saving a JMP script from the red hot spot in JMP and uploading it to the GitLab repository.   So I wanted to recap on the features of Y'All.   We have very easy script contribution. As you've seen, you've got a nice user interface to upload files, upload JSL scripts, and make them available.   And with that comes automatic version control and code archiving. So since this is all happening through a central GitLab repository stored on Novozymes servers, all of that is tracked and stored...   ...in perpetuity. Automatic updates so developers can release scripts to their users very easily, and they can always have the most recent version...   ...of those scripts that work properly and are quality-controlled. And we've got beta and release channels.   So the beta channel is going to be essentially instantly built, available within a couple of seconds of a new script hitting Y'All and updated frequently. And the release channel is going to be a more slowly controlled release structure. I wanted to touch on how we do that.   You can see here the branch structure in our GitLab repository. We've got a beta branch with a lot of updates. We've got a release branch. We've got a main branch. What we do in Novozymes -- this is totally up to user choice -- michael Akerman ...we take this beta branch when it's in a state where we want to release it. We open a merge request between beta and release, and move it to the release branch. That's going to essentially freeze the screen release branch... Michael Akerman ...at the code version, where we did that, and that locks the code down on this release branch.   Beta can still change. People can still add scripts and change scripts on beta. But release is now frozen. We quality-check that, and when we're prepared, when we're happy with the state of it...   ...we release it to main and merge it into there, and now that is available to main branch users of Y'All.   We have discoverable code. Everything's available in a file structure on a central repository. Developers can look through the scripts and...   ...find similar ones to build off of and improve on. And we've got utility functions automatically added into the scripting index, automatically available to developers, so that they can reuse blocks of code like that recode if...   ...or some database queries and things like that that we've included as demonstrations as well. So, if you look through the Y'All...   ...namespace in the JMP scripting index with Y'All installed. You'll see that we've got some some postgres database queries and postgres database tooling that we've included to demonstrate how we use that as well.   All of this is happening by starting in JMP with a user-created JSL script that they want to share with somebody else in the Y'All user base.   They go into GitLab. And in the user interface upload that into a subdirectory matching the menu location where they want that in Y'All so if they want it in demos demos test...   ...they would put that in the src slash scripts slash demos slash demos subdirectory. Once they do that, an automatic build pipeline fires off that launches the Python build script, which parses structures and builds the add on....   ...the add in and re-uploads that to the GitLab server. JMP then finds that new update and automatically offers to download it and update the Y'All add-in in JMP. And that script is available within just a couple of seconds to users on that beta channel.   So we hope you're excited about Y'All.We hope you're interested in using Y'All yourself or in your organization. We wanted to give some tips on how to get started, if you are.   The Zip file that's posted in the JMP User Community if you're watching this video in the JMP User Community -- it should be on the same page -- has all the necessary elements to get started. Some of those will need modification. Some of those are demo code for you to play with.   But feel free to download that and take a look. The Python build script in that Zip file is built on native Python 3 libraries. There's nothing installed but Python on our build system. And it should run on any standard Python 3 installation.   The central repository storage and automation that we do with GitLab for CICD for continuous integration and deployment, that's available through other providers. GitLab, GitHub, Azure, AWS all have similar implementations.   There's a huge ecosystem of options out there, so the implementation details will be specific to your organization.   The unfortunate side effect of that is that CICD pipeline file gitlab-ci.yml will require a modification, if you want to run this in an automated fashion on your own systems. It will not work as it is on your systems, unless of course you're a Novozymes employee.   Any connection strings in our utility functions that that we have that reflected our internal beta database have been replaced with ODBC my underscore connection underscore string.   If you want to use any of those functions, you will have to go into the code in the working directories and change those strings to reflect your database.   So, once again, we hope you're excited about Y'All. We hope you're interested in Y'All. If you have any feedback, questions, comments, please feel free to reach out to me or Johan...   ...with them. We would also love any contributions you have, if you take this code and improve on it or take this toolkit and use it in an interesting way. Thank you. Christy Spain Okay, that was great. Michael Akerman Time. Christy Spain Reasonable perfect yeah yep. Michael Akerman Does it.   doesn't matter that Johan said no JMP wants. Christy Spain You know. Michael Akerman I feel, like most people probably won't catch it. Christy Spain As a new word I like it. Johan So we were told to not use that so we had to rename the thing and I I got thrown off by not having deleted the version I had.   Previously installed.   So when I realized that I was already on beta it kind of threw me off, and then I just.   anyways. Christy Spain yeah I think that's fine, but now that was great very interesting I think people are gonna love to see this, so we really appreciate you guys participating.   As I said, I'm going to upload the videos for Jeff to edit I'm not sure when that's going to be completed, but.   Anyway, you guys will see all that October, but if anything comes up i'll certainly let you know but.   thanks again and if. Johan You have a new breed of be about the normal JMP. Michael Akerman About you saying you're somewhere else it just kinda.   yeah. Johan Y'All directory. Christy Spain Yes, thank you so much, I know it's.   A lot of work to disease, and we do appreciate it. Michael Akerman Our pleasure.   Thanks for. Michael Akerman moving quickly after after the next up. Christy Spain Yes, man, it was of course great to meet you both and.   Hopefully next time.   At least with Michael it'll be in person, Johan i'd like over there, meet you as well, but.   yeah. Michael Akerman it's always you know if if he's, especially if he's presenting and just every summit, or something that's a in the past in person it's been the same week as all things open in raleigh so you can do like a whole week of.   computational conference.   and see a bunch of local local employees. Johan might be easier to argue the travel budget, then yeah. Christy Spain Two for one right.   yeah. Johan All right. Christy Spain Okay well you guys have a great rest of your Monday great week thanks.   You, too, can talk to you again soon. Johan bye.   bye.
Lim Yong Kai, Student, Singapore Management University Chen Li, Student, Singapore Management University   Economic theories are supposed to interpret why and how the economy behaves and then determine the best solutions to influence or solve the economic phenomena. However, these theories are full of assumptions, hypotheses and contexts, in terms of moral values and politics. Price movements of stock markets have never been explained well by any economic theories. Hence, our question as investors: Is there a holistic way to understand the price movements in the market without application of any economic theory? An option would be to use unsupervised learning to detect objective patterns of the subject without the requirement of any domain knowledge. We believe one approach to understand real-world complexity is to get the pattern first, followed by forming and studying the theories.  In this paper, we explore more than 100 ETFs in China's stock market without prior domain knowledge of each ETF by using dynamic time warping clustering (DTW) and the agglomerative hierarchical clustering method to detect the similarities of their price movements. Our results show that clusters from the DTW method largely coincide with the type of industries that ETFs involve. Analysis of the clusters’ price movement also revealed that certain industries performed better after 2019, when compared to 2018 in light of China’s new self-reliance economic direction.      Auto-generated transcript...   Speaker Transcript LIM Yong Kai Okay. Hi everybody. So I'll be presenting our paper. The paper title is Analysis of Stock Market with Unsupervised Learning, specifically an analysis of ETFs in China stock market using dynamic time warping. So, my name is Yong Kai the author of the paper, and Chen Li is my coauthor. So the paper was something for Associate Professor Kam in a master's program at Singapore Management University. So the agenda of my presentation today as follows. First, I'll start off with an introduction and objective of the paper. Next will be the data preparation, followed by the clustering. Next will be the analysis of our clusters' performance and, finally, will be the conclusion and the future works of our paper. So I start off with the introduction and objective. So exchange traded funds, or ETFs in short, allows investors to get access to many stocks across various industries and they tend to have better risk management through the diversification in their portfolio. So China's stock exchange is the world's second largest stock exchange operator, and due to the popularity of ETFs, many new ETFs have been created over the years, which makes it hard for investors to select and choose the ETF to invest in. So the objective of our paper aims to narrow down the vast number of ETFs and help investors with little domain knowledge to build their investment portfolio by using unsupervised learning techniques. So two clustering techniques will be used here, mainly the dynamic time warping and also hierarchical clustering. So these two clustering outperform to aggregate the ETFs into clusters with similar price movement pattern. So in this paper, we will look at the ETFs listed in the Shanghai and Shenzhen stock market in China. So the software tools that will be using is definitely JMP Pro. We're using the version 15.2. JMP Pro is used in our data preparation and also our analysis and results analysis using Graph Builder. And we will also use SAS Enterprise Miner for our clustering, which will perform dynamic time warping and also hierarchical clustering. Moving on will be the data preparation. So six different data sets were used, mainly they are the daily closing price of Shanghai's ETFs from 2018 to 2020, and also the daily closing price of Shenzhen's ETFs, similarly from 2018 to 2020. So the workflow to prepare the data for clustering is as follows. So we will first import our six data sets into JMP Pro. And then we would join the multiple tables for both Shanghai and Shenzhen ETFs. And next we will use the missing data pattern function in JMP Pro to clean up the data set. Thereafter, we also tabulate the monthly return for each ETF using the formula as given, which is the price at the end of the month, minus price at the start of the month, divided by the price at the start of the month. And finally, we have combined both tables together and now it's ready for clustering. So firstly we will use the import multiple files function in JMP to batch import our six files from our raw data set for the ???. So here, you can see, we are using the input multiple files function, and this is the folder for our data sets. So, as you can see, with this function is very easy for us to import all the different data sets into JMP Pro at once. So, as you can see our 6 data sets have been loaded. Next, our second step will be needing to join all the different data...all the tables together. Here I'm using the Shanghai 2018 data and I'm performing an outer join with 2019 data and the matching column will be the serial number of each ETF. So after combining this, it will comprise of all the ETF data for 2018 to 2019. Next, I'll perform another outer join with the 2020 data. And now the resulting data set will have all the data, ranging from 2018 to 2020 for Shanghai's ETFs. And of course we would repeat the process for Shenzhen ETF data set as well. So, moving on, we'll check for missing data, and we will using the missing data pattern function under the tables tool bar in JMP Pro So under tables tool bar, we can choose the missing data pattern and we can select all our columns to check for missing data. As you can see, there are 98 columns that have no missing data, and we can select it which will ??? back to our main table. We can select that 98 rows and this will be our resulting data sets for clustering. So this will be also performed for the other data sets. Next we we'll also do our tabulate our monthly return, which is the formula I mentioned earlier, and what we do is we are insert a column into the data. Of course, we will insert a formula. As mentioned, the formula will be the closing price at the end of the month, which is on 31st of January 2018, minus the first day, which is the second. And, of course, divided by first date. It should give us the monthly return in a proportion of percentage. So of course we can fine tune these, we can change the column name to January of 2018, and of course, we can change the format to percentage with two decimal places. So here we go is the monthly return for January, and we can repeat for all the months throughout the entire data set. And lastly, after both data sets for Shenzhen and Shanghai ETFs are prepared, we will need to combine them together. So we will use the concatenate function in the table tool bar to combine both data sets into one, which will be used for clustering modeling after. So as you can see, both data for Shenzhen and Shanghai ETFs have been combined into one with 143 rows. We should be using our clustering thereafter. So now moving on to our clustering. hierarchical clustering uses a proximity matrix to determine pairwise similarity and dissimilarity between each monthly return using Euclidean distance. On the right, dynamic time warping is used to compare similarity between time series, which accounts for the time factor. It has equal factor in propagation delays and even detect time series that share similar patterns that are slightly out of phase. So although JMP Pro is able to perform hierarchical clustering, dynamic time warping is only available on SAS Enterprise Miner, hence both clustering modeling was performed in SAS Enterprise Miner. Now perhaps JMP Pro might want to consider including dynamic time warping in the clustering function in the future, so for us users, we can have an all in one software to perform our entire analysis from data preparation to modeling and, finally, the visualization of our results. So after the clustering is completed, the data was loaded back into JMP Pro from SAS Enterprise Miner for visualization and analysis. So clustering is an unsupervised modeling method and JMP Pro Graph Builder is very useful in aiding to visualize these results. An appropriate visualization to be used is the heat map, which allows users to quickly get insights from the clustering result. So along the X axis...the Y axis are the different ETFs and the X axis are the monthly returns of ETF. So from this heat map, we can quickly observe that within this cluster, the monthly return shows signs of similarity indicated by the intensity of the color, which is the monthly return. So next we will work to compare two clusters from both clustering techniques. On the left will be the dynamic time warping and on the right will be hierarchical clustering. So dynamic time warping algorithm clustered seven different ETFs, which all belongs to the growth enterprise market. On the left and on the right are the hierarchical clustering which is a cluster of a combination of some growth enterprise ETFs and other ETFs together. So dynamic time warping also cluster two additional ETFs belonging to the enterprise sector, which was missing in the hierarchical clustering and they are boxed in red on the left. This exhibit merits of the dynamic time warping algorithm in comparison to hierarchical clustering algorithm, where it can pick up ??? pattern when the clutering is performed. So I will show... So, as you can see, this is what we have done in JMP Pro. On the left will be the dynamic time warping and on the right will be the clustering. So using Graph Builder, we can interactively see the same ETF within the same period as they are both linked to the same data sets. So other comparisions, we can have a look at cluster five and cluster nine. So the local data filter here allows us to toggle between the different clusters easily to make our comparison. You can see, we can look at the similar, as you can see these few are not present in this dynamic time warping. As you can see here, it's linked. And of course, so we can maybe have a look at the last one, so these two are comparative clusters that are the same...similar, so the dynamic time warping grouped four ETFs together, whereas the hierarchical grouped about eight of them together. So, moving back. So there are several similar results when we do both clusters, when you compare them using the JMP Pro Graph Builder heat map, which is also documented in our paper so we will put our paper on JMP Community for all our findings in our paper. So comparing the clustering results with the ETF portfolio composition, dynamic time warping algorithm managed to cluster ETFs of similar industry or portfolio more accurately than hierarchical clustering. This is may be because, instead of only calculating the Euclidean distance between same time period as per hierarchical clustering algorithm, the dynamic time warping can slide along that time axis to calculate the shortest distance between two times series and also detect patterns that are slightly out of phase. So now we move into the analysis of our clusters in terms of the performers in monthly return. So JMP Pro Graph Builder was used to plot the monthly return trends so Graph Builder allows us to build visualizations using multiple ETFs monthly returns. This is a box plot for each month and those show a smooth trend line through out the three years period. So the interactive function allows us as users to ??? insights for visualization for analysis. And from the graph, we can also observe that the range of the fluctuation from 2019 to 2020, in blue here, is higher or larger as compared to 2018. The inter quarterile range of the months that follow are also way larger which signifies that for those months the variance for the monthly return amongst the ETFs are also larger. So from the graph you can see that the best performing months are, in fact, February of 2019, you can see here, July of 2020 and also February of 2020. And the contrary, the worst performing months are much of 2020, followed by December of 2018 and then June of 2018. So drilling down to display only specific clusters, we can use JMP Pro Graph Builder, the local data filter. This was similar to the demo that I showed where we can toggle between the different clusters to look specifically and zoom in on different clusters. So we can analyze those clusters individually. So we pick up two clusters here on the left, they are mainly ETF belonging into the military manufacturers, and the right are technology companies in the mature enterprise market. So both clusters have seen record highs from 2019 to 2020, which is also are the best performing months, as were shown in the previous slide. However, taking a closer look at a box plot for each month, the cluster on the left has generally smaller interquartile range compared to the clusters on the right. This indicates that picking any of the ETF from the cluster on the left will produce more consistent monthly return as compared to the ETFs on the right. Definitely if you are buying between 2019 and 2020. So if we drill donw on the four clusters heat maps using JMP Pro Graph Builder we can observe that in 2019 to 2020, the monthly return performs generally better as compared to 2018. So we can see they are more high intensity positive return, which are indicated by brighter green color from 2019 to 2020, as compared to 2018. So we did some investigation and our preliminary assumption that it has to do with China and United States rising trade war tension, which started in 2017. And this has disrupted China's economy and trade markets infrequently. In 2019, Chinese president also called for China's self reliance. According to statistics, China's high tech manufacturing make up a larger portion of the country's industrial growth in the first half of 2019 as they are shifting away from dependence on foreign technology and other products. So, furthermore, China has also invested heavily in industries such as artificial intelligence and also integrated circuits to achieve their goal toward self reliance. So this economy-driven growth in China might have led to this few industries to grow positively in 2019 to 2020 after the announcement and the shift in the country's direction. So finally I'll be closing with our conclusion and some of the future work that can be applied with our paper. So although both hierarchical clustering and dynamic time warping produce very similar clustering results, there are merits of dynamic time warping algorithm as it managed to cluster the ETFs more accurately. Furthermore ETFs monthly return, so in a time series and the dynamic time warping algorithm was to produce better clustering results. We also observe correlation of clusters' monthly return performers to China's economy goals, and of course this resulted in the better monthly performers in the sectors. So some of the future work that can be done is to explore other correlation of industries with the country's macro economic factors to draw different insights. And also, we can apply dynamic time warping to different financial instruments such as stop risers commodity price...commodity prices, currencies, or even derivatives. And lastly analysis can be performed, you know, in different time period, such as pre-COVID and post-COVID which, of course, we have not yet to reach that stage. Even in different time interval. For now we are using monthly, you can also look at weekly, quarterly, or yearly...monthly....yearly returns. So I'd like to JMP Pro for giving me the opportunity to showcase my work through this platform. I'd like to also thank my co author and mentor for supporting me throughout this journey. Lastly I'll do some reflection on my personal experience as a user when using JMP Pro. JMP Pro provides an excellent user experience for us as users, and it's very interactive and dynamic data analytical software. It allows for us users to prepare our data preparation so much quicker and with so much more accurate results. So Graph Builder function is also very handy and useful function for the analysis of our results. We can draw beautiful insights and interesting information from this Graph Builder and, lastly, statistical analyses, such as hypothesis testing and even other modeling techniques such as clustering or PCA are also very user friendly when we are using JMP Pro to do it. So thank you for your time. That sums up my paper.  
Andrea Coombs, Sr. Systems Engineer, JMP   Ten years ago, I gave a JMP Discovery talk entitled, "JSL to the Rescue: Coping Strategies for Making Sense of Real-Time Production Data in Vaccine Manufacturing," which proposed some approaches for creating a custom workflow to make sense of curve data from a manufacturing process. The amount of data, data processing, and modeling was overwhelming, and a considerable amount of scripting was necessary to monitor, characterize, and improve the process. Ten years later, we now have the Functional Data Explorer, which would have made this effort not only much easier, but also more informative in term of process performance and process understanding. In this presentation, I show how making sense of bacterial fermentation data in the Functional Data Explorer is a piece of cake and how the results can bring peace of mind -- no coping strategies necessary!   The end of this presentation includes some thoughts on monitoring fermentation growth curves over time.  One option discussed is to calculate the integrated difference of each curve from a standard curve.  The following JMP Community discussion shows how to calculate the area between two spline fits. https://community.jmp.com/t5/Discussions/Calculating-area-between-two-spline-fits-bivariate/td-p/1180       Auto-generated transcript...   Speaker Transcript Andrea Coombs, JMP Hello, my name is Andrea Coombs, and I am a systems engineer with JMP. So today I'm going to be telling the story from my years when I worked in the industry, specifically in the pharmaceutical industry. So here's what I'm going to be talking about today in a nutshell. coping strategies for making sense of real time production data in manufacturing...in vaccine manufacturing. And in that presentation I discussed ways of dealing with curved data that we were getting in real time from sensors on our manufacturing floor. And today, what I want to talk about are some of the new tools that we have in JMP, specifically the functional data explorer, and how that would have made all of this work a piece of cake. And I'll be going into ways to use the functional data explorer for process understanding, process optimization, and process monitoring. So let's first start by talking about the process and the curves that we were dealing with. So here on the left, we have an example of a hypothetical vaccine manufacturing process. It starts with material prep. In this case, we're doing bacterial fermentation in two stages, then the product is filtered, absorbed, and harvested, and downstream the product is centrifuged, and then finally, multiple sublots of material are formulated together into one lot of material. And today I'm going to be focusing on the fermentation stages of this process. So while bacteria are fermenting, they're going through growth and the growth curves look like something over here on the right. So during growth, there is a lag phase, there's an exponential growth phase, a stationary phase and, finally, a logarithmic death phase. And while I was at this company, we were going through an evolution of process under...of process understanding, specifically around these growth curves. So we started off early on, by just taking five samples from these two fermentation stages, sending those samples down to...to the QC lab and waiting for results. And we got optical density, so we could tell the density of cells in those samples, and the results we got looked something like what you see here on the right. We have a distribution of results at those specific time points, but we really didn't know what the growth curves looked like. They could have looked like any of the curves that I drew in over those distributions down below. We just really didn't have any idea. But then we implemented inline probes, so we implemented turbidity probes so we can actually see what those growth curves look like. However, we had to gown up and go into the manufacturing suite to manually download the data, bring it back to our desks, and then try to figure out what we were going to do with all of this data. And then, finally, we upgraded our PLC to where we could now have all of the data from all of our sensors at our desk, no need to gown up and go back...and back into manufacturing; we had all of the data at our fingertips, which was great. But through careful analysis, we were able to determine that we had too much data and we had a lot of data to deal with. So that's why... so that's why I gave this presentation on coping strategies for making sense of real time data. We put a lot of work into making sense of our data, and I wanted to share some strategies you could take to make sense of your real time data. The first thing I talked about was parameterizing those growth curves. So we had these curves but we wanted to be able to take a look at the growth and understand what was going on, so we we parameterized them. In other words, we took the slope of the growth, the slope of the death rate, we were looking at duration of time that the bacteria were in each of the phases. We were looking at the turbidity at certain time points, and now we had a big collection of these growth...what we called growth variables. And these growth variables were very useful for us. We used them for process understanding, for process optimization and process monitoring. Now the second group of coping strategies I talked about was just the processing of all this real time data that we had. So of course, once we started finding all of this data useful, we wanted access to it right away. So the first thing we had to figure out is all of our data sources and making sure we could get access to everything we had, not only data from our historian, but we had data from our ERP system, in our laboratory information management systems. We had to bring that data all into one place. We used a super journal, so this is kind of the command central of accessing all of our data and running all of our scripts. This was really a lot of work that we did. We were pretty proud of ourselves, but you know in hindsight, knowing what tools we have in JMP, specifically now with JMP 16, there are a couple things that would have made this work a lot easier. With the growth curves, we could have used the functional data explorer to analyze our curves. We wouldn't have to parameterize them, where we were leaving a lot of the data behind. We can use all of the data in the functional data explorer. And then, of course, action recording in the enhanced log, that would have made the efforts in the scripting so much easier. But today I want to just focus on how I would have used the functional data explorer. Of course, I don't have access to that data anymore, so I simulated some growth curves. Here I have 20 different curves, which represent some variability you could see during bacterial fermentation. And with the functional data explorer, it really boils down to these four steps. The first thing is you're going to take your collection of curves and you're going to smooth all of the individual curves. The second step is to determine a mean shape and shape components that describe all of the variability that you see in these curves. The next step is to extract the magnitude of the shape components for each curve. And once you have these two shape components, they're actually called functional principal components, then you can use those for however you want to analyze your data, whether your curves are inputs to models, if they're outputs to models, if you want to do clustering analysis. Really, you can do any kind of analysis now with these functional principal components. In the example today, all of my curves are outputs to a model so I'm going to be using functional DOE to take a closer look at these curves. So let's get into JMP. Here I have my data table, where I have the batch numbers, I have the time in the turbidity, which make up the curves. And I also have a bunch of process parameters. And the first thing I want to do is, I want to do some process understanding to see if any of these process parameters are impacting the variability I'm seeing in the curves. So let's go into the functional data explorer. We'll put in time as our X, turbidity is our Y. We'll put in our ID function and then we're going to take all of these process parameters and enter them in as supplementary variables. Now, the first step that we need to do is, we need to fit the flexible model to all of our curves and I'm going to use directional...direct functional PCA to do that. Here are those flexible models. The second step, which JMP has already done for me, is determine the mean curve and our shape components. And JMP has identified six different shape components for me. Over here on the left, I can see those six different shape components. And here's the cumulative amount of variability those shape components are explaining in my data. You can see, just with the first two shape components, I'm already describing 92% of the variability in my curve, which is great. And the remaining a shape components here, I'm going to exclude, because they're probably just explaining random variability in my data. So let's go ahead and customize our number of FPCs down to two and I'm left with those two shape components. Now, the next step is determining the magnitude of each of those shape components for each of those curves, and I have them here in my score plot. On the X axis, I have the Shape Component 1, and on the Y axis, I have Shape Component 2. Below I have a profiler where I can take a look at my curve, which I've highlighted here in yellow, and you can see how the first shape component impacts the shape of the curve, mainly the growth phase of the curve, and the second shape component is impacting mainly the death phase of the curve. And let's take a closer look at a couple of these batches. Let's take a look at Batch 15. Here's the curve for Batch 15, and we can see the magnitude of Shape Component 1 and the magnitude of Shape Component 2. I can enter these in right into my profiler. Now you can see I'm able to reproduce the curve for Batch 15. Let's just take a look at one more. Let's take a look at this Batch 2, and you can see this looks very different compared to Batch 15 and the shape...the magnitude of shape components are very different as well. So let's take a look at those. And you can see now we've been...we're able to replicate that curve as well. So, knowing the mean shape, the two shaped components that represents the variability around the curves, we can use now these magnitudes of those shape components to do an analysis. And we can do the...when your curves are the response in your model, you can do that analysis right here in the functional data explorer. So down here below, I have a new profiler. And this time in my profiler, I have...I have my curve again, which I'll highlight in yellow again. And then I have the pH, temperature, pressure, and mixing speed parameters. You can see here for mixing speed, it's pretty flat, so it's not impacting that shape. Same for pressure, but when I start to look at temperature, you can see it changing the shape of my curve, mainly the growth phase of my curve. And as I change pH, it's mainly impacting that phase of my curve. So that's the approach I would take for process understanding. Well, what about process optimization? And by process optimization, I mean, you know, if there is a particular curve that we want to standardize on, what would be the the pH and temperature we would need in order to get that desired curve? Well, the first thing you need to do is you need to define what that desired curve looks like. And I just have a few data points here at the bottom of my data table that describe that shape I'm looking for, and you'll see them here on my... in the graph builder. And in this case, I'm interested in getting a curve where I have a really fast growth rate and then a really slow crash rate. And this is really easy to add into the functional data explorer. We just need to do one additional step; I'll recall what I did before. And the one additional step is to load that target function, identify which function represents our target. And all I am doing now is I'm going to replicate what I did before. I'll do my direct functional PCA. I'm going to customize my FPCs down to 2 and this...these results are exactly what we were seeing before, and then I'm going to come up and do my functional DOE. And again, I'll highlight my curve here. And now that that target curve has been identified, I can maximize the desirability, so in other words, I can find the set points for pH and temperature and pressure and mixing speed that will give me a curve that looks most like that target curve. So there is really fast growth rate, a slow crash rate. And I would just need to start with media with a pH of 7.2 and use the temperature of 37.9 on my fermenters. So that's the approach I would have used for process understanding and process optimization. When we're talking about process monitoring of growth curves or curve shapes, there are some different options out there and I have a couple different thoughts here. If this is something that you want to do, I highly recommend that you check out this 2020 presentation, statistical process control for process variables that have a functional form. This was a presentation that was recorded and it's really great. They talked about taking those functional principal components and putting them into model driven multivariate control charts. They did a great demonstrate...demonstration of that and then also talked about some maybe drawbacks of taking that approach. So I want to discuss another approach that you might want to consider and that's looking at the integrated difference from a standard. So just like as before, when we were looking at that target curve, you could use that target curve as a standard and then for each batch, calculate the difference from that standard. So for example, here I have a curve from one batch that's represented here in blue...or sorry, in red. The standard curve is in blue and I'm taking the difference and the difference here is on this green curve down below. And you can just integrate over that curve to find the area under the curve, and then you could use that in your statistical process control program. And I just want to quickly take you through the steps of how to do that. So here is my time and my standard turbidity values, my turbidity for that first batch, and I've just calculated the difference between those two. And then what you can do is you can use fit Y by X to build a spline model for that difference over time, and then you can save the coefficients for that model to a data table. What you'll get is your X values and the four coefficients that you can use to build the spline model. And here's that formula to build the spline model, and then you integrate this formula. So you can either do this through scripting, I chose to do it in the formula editor. So I've just determined the area for each slice of the curve here by just integrating that previous formula by hand. So once I have these areas slices, then I just need to add up the areas that I'm interested in. So I can do the area for the total curve. I can also do it for just the growth phase and just the death phase, and I've done that here up here in my table variable. So I have now a variable for my total integrated difference, for my integrated difference just during the growth phase, and my integrated difference during the death phase. And essentially you just go through and do that for each of the batches. Here I've done this for all 20 batches, and now you can do your statistical process control on those variables. So those are kind of my approaches that I would have taken to look at process monitoring and also process understanding and process optimization using the functional data explorer. I hope you check it out and it's a piece of cake for you as well. Thank you.  
Stanley Siranovich, Principal Analyst, Crucial Connection LLC   In this session, we analyze actual process data from a Kamyr digester at a Canadian paper mill. Our data set consists of 301 measurements of 23 variables, taken over more than 12 days. Our dependent variable, which we want to predict and control, is the Kappa Number. It is roughly proportional to the residual lignin content in the pulp and is a measure of the effectiveness of the pulping process. For independent variables, we have a combination of chip, steam, air, and liquor flow rates, along with temperature, level, and moisture measurements. Several characteristics make this data set particularly challenging: The division of the continuous reactor into separate zones, each of which performs a different function. Separate heat inputs, plus liquid inputs and outputs for each zone. Countercurrent liquor flows. Because of the above (and because we want to understand the process, not just predict an outcome variable), we use several of JMP’s platforms from the Predictive Modeling, Screening, and Multivariate Methods menus. Finally, to demonstrate analytical flow and successive discovery, we conduct most of the session as a live demo.     Auto-generated transcript...   Speaker Transcript Stan Siranovich Okay, so starting off with the observation, we have the observation in the first column. And right away, we see we have a little bit of a weird format.   With that we'll address that later. Next column is Y-Kappa, and that is what's known as target variable, that is what would like to predict. That is the measured value and we'd like to keep that within a narrow range.   And in another minute or two, I will explain why. And as we scroll through the data, we see a couple of things here. We've got some oddball labels. We have the T-upperExt, whatever that is. We have a T-lower, we have a WhiteFlow. And we noticed that   some of the columns, as a matter of fact, most of the calls have a number at the end of them. And it turns...after some investigation, we find out that that is the lag time.   And I'll get to that when I explain the process. So we have things like chip moisture, steam heat, and we also see that we have some missing values. There...there's a few here and there, but mainly it looks like they're confined to the AAWhite,   excuse me, at the very end, the sulphidity. So we make note of that.   Now   since we're not familiar with any of this sort of thing, probably the smart thing for us to do would...would be to do some research and see exactly what it is that we're looking at. And we'd also like to know (let me get the PowerPoint up here)   exactly   All right.   Let's begin. Well, the kamyr digester is a continuous four-step process for the manufacturer of pulp and paper.   Now these   process...these processes are quite large and you can think of it as a giant pressure cooker. What we do is check some heat, steam, and pressure,   excuse me, into the reactor. And these can be 18 feet in diameter and up to 200 feet tall. And what we wanted do is separate the lignin while controlling the Kappa number.   And the feedstock is a little bit unusual. I came out of the chemical industry and I'm used to working with   a liquid feedstock but here our feed stock is wood chips or sawdust from a   sawmill whatever. And we don't have any information as to whether these feedstocks are dumped in together, they go in one on top of one another, whether they're   campaign, but right away, we we know we have some variability here. No matter what the physical form is, the feedstock is composed 50% water, 25% pulp, and 25% lignin. And, of course, the lignin is what we want to dissolve out of there.   Now the process looks like this. It is complex and within this single unit are four separate unit operations. We have the top, where the chips are dumped in and the reaction mixture, which is...which is referred to as a liquor.   It's got some caustic...some alkalinity to it, but you can think of it as like a soap solution. It's going to separate the lignin from the wood.   And we also have the fluid flows, both effluent and also have liquors, and there's three of them.   And it can be either concurrent or countercurrent, so there's another variable we introduce. As I mentioned before, the target variable is a Kappa number, and that is an ISO determination, and we've got another variable here, in that the fluids are recirculated and regenerated.   So let's start off with some data exploration.   And here's what the JMP data table looks like immediately after I brought it into JMP. Let's start off with the observation, Y-Kappa,   which we want to measure and then the chip rate. Here are some other ratios. And we can see that we have some missing data, most of it seems to be confined to that column there, and this column over here, the sulphidity.   So let's take a look at those to see if it's anything that we should be concerned about. And the best way to do that, I think, is to go up here to tables and look at the missing data pattern.   And what I'll do is click drag and select all those. And I'll add those in.   And we get this table.   Now, you should see the missing data table   right on top of my JMP table.   And let me expand that first column a little bit. Now if we looked at the first column here that says count   and we have number of columns missing is zero, and we see zeroes all the way across here in the next column, that's our pattern and that tells us we have 131 columns with no missing data but we've got 132 columns   in which two columns have some missing data. An where you see the 1 and the 1, that is where the data is missing. And we have another pattern here with 19, but the rest of them are fairly filled...   fairly fully populated, and if you've ever dealt with the production data or quality monitoring data, you see that this is not unusual. As a matter of fact, this this is   rather complete for this type of data. So let's look at that a little more. We have 131 with one missing, and we know we have those two columns.   Let's see if they they match up. That's it. It is the same record with with same two columns with missing data. And what we can do is select that first row   and we can look across here.   Let me pull this out of the way. And we can scroll.   And sure enough,   we see the blue highlights there,   which highlight the non missing data, and we see that the missing data tends to line up. There a dot here, dot here, so it looks like all these missing data in those two columns are in the same row.   So let me go back   and stop the share.   And we're now going to do some data exploration, and I will close that window.   And we are back here to the original JMP data table.   And let's see, where to start? Well, we've got a couple of choices here, but one thing we can do is click on this icon right here, and it says show header graphs.   So we click on the icon and we can see some header graphs here. And what that does is show us some histograms of the distribution   in column underneath. And these these these are dragable, so we can make them larger in either direction. And why don't we do a little bit of exploration here. Since since we want to control the Y-Kappa,   let's let's select some of those high values and see what we can see. Now scroll on, scroll on, it looks like it's   somewhat related to what's going on in the BlowFlow that...   that is at the top of the range here, and if we go to T-upperExt-2,   we can see that it tends to be in the bottom half of the distribution. And other than that, nothing really sticks out for us, so let's click right there   and there, to clean that up a little bit.   And let's do a little bit more explanation. Let's go to graph builder.   So you go to graph...   graph builder and this window pops up.   And what I'm going to do is look at the Y-Kappa   There we go, and I can put...I can put a smoother line in there.   And what I see is   some variability in our process, but it's rather stable, maybe a slight upward trend here, but stable and that's going to be important a little bit later.   But if I look down here, where it says observation, what do I see? I see it starts with 1-00 and then 01 and time keeps going up. But if I look at the data table here, it starts with 31. So what they apparently did here   on the unit was start with the date of the month, and then the hour and they just continued with that format, but that's not what we want. So let me close this. I'll stop the share and bring up another JMP data table.   Okay, so what we're looking at is basically the same data, and I cleaned it up a little bit, and I made some changes. And as we look at that, I have it arranged like I usually want to arrange it with the with the Y-Kappa, which is our   ...our target value, followed by the observation. And here I added an hour sequence, and let me show you what happens here.   And I've saved all the scripts to this data table, same data, it's just I opened up a different one, which has been cleaned up. And you notice that we have the same number of decimal places, all the way across all our tables here, if I scroll over.   And let's let's repeat   what I just did before. So we're going to go to graph, graph builder.   Put the Y-Kappa here and now I added Y sequence to   our data table, and now we get a different pattern. If I open up the original,   we see that we have a different smoother line, and what I try to do is look for some constellations, what I call constellation. So here we have these three data points.   I'll close this and open up the second one.   And the three data points are here.   Let me shut that.   And notice down here with the hour sequence,   we have the data points in order that they are, and another thing we could do with JMP   is we can go up here,   switch   observation, which we had before, and now move that down here.   And we'll put our smoother line in again, you can see, we have the same pattern, but now if I take the hour sequence   and drag it down here, we have the best of both worlds. They're in order, which is how we wanted them.   And we have, if you notice at the bottom here, it's observation ordered by hour sequence descending, which is the default. And I could click done, but for right now, I'll just minimize that.   And we'll move on to the next step. Check my notes, make sure I didn't leave anything out.   Oh, and how I put the new column in there, let me stop the share here, and rather than going through the entire process, let me switch back to the PowerPoint.   And I gave the column a new name, Hour Sequence, and for data type, I chose numeric, because that's what it is. For   modeling type, I chose continuous because it is continuous data, even though it's taken a regular hourly intervals. And I chose format here because it's easy for human being to read it.   And for initializing data for the column, I chose sequence data and ran it for 1 to 301, because that is our last row. And let's see, repeat each value one time, number of columns to add, just one.   And for column properties, I chose time frequency from the drop down, and also chose hourly. So that is how I got that column.   So let me stop the share on that   and minimize it and I will bring up to JMP data table again.   And here we are back in the data table. Now we start with the fun part, move on to the analysis.   Let's go   to analyze.   And I got the wrong data table. Let me open up the other one. Here we go.   Go to analyze,   multivariate methods   multivariate.   And we're presented with this screen. Well, we want to look at Y-Kappa so put the Y in there.   We don't want any weights and frequencies or by and let's   click drag.   Put all those in there. Click OK, and here is the result. And to start off with...let me close that...   and we're presented with the scatter plot matrix. And here we can look for some patterns in the data. We don't get to see a whole lot of them, but if we look at Y-Kappa over here in the left-hand-most column,   start looking over, looks like we may have some sort of pattern here with...   with whatever this column is. Scroll down a little bit and looks like it is   one of the liquor flows, etc., etc. So we can do that to get a rough idea. Let me close that.   And we can look at the correlations here, and the blue ones are positive correlations, and red ones are negative, and none of them are real high.   And some ways that's good and, in some ways that's bad, but we won't talk about that a whole lot for right now. But we just look at that. In some situations, we get more information from that than for other information, so let me close that.   Next let's go to model screening, so we go to analyze,   screening,   predictor screening. And we're presented with this window and once again,   Y-Kappa is our Y response. And here we're presented with the next window. So for right now, we'll ignore the observations, because we have hour sequence here, and it's a whole lot easier to read, so we'll put that in here and click OK.   And this is what we get.   So we have our predictors here, and of course, the ones with the higher values and the bigger bars are going to probably be the most important predictors ranked for us.   And we see a bunch of variables here. Now we come to a decision point. Well, these first two, they are obviously   something we want to look at, probably the third one too, and the fourth. Now the rest of these, they're somewhat minor, but they're all about the same, so let's do this.   We'll click drag, oh and before I go on to these two columns, contribution is a contribution to the model in the portion, which is the column that I usually look at is percentage portion that this   predictor here predicted to the model, so if you added all these up, it'd come up to 100%. And we have a... we have a link here, it says copy selected.   So let's do that.   And I'll minimize that window and go back up here to our analyze window.   And let's see. How about if I fit models? Fit linear regression models.   And I'll click on that, and notice that the selected columns are already highlighted for us.   So let's add them and, of course, we want the Y-Kappa again.   And we'll select that, and when I did that, it opened up   an extension here of our window and a couple more personalities...a couple more choices here with the drop downs. And the first one's personality, so let me click on that and it gives us a number of choices here. But for right now let's just stick with standard least squares. And then for emphasis,   we have three choices. And what the emphasis is what is revealed to us in the initial report window. For right now, I'll just select   minimal report. And let's see, degreee, 2, okay, so it'll look at squared terms.   Let's see...and I guess we're ready to go, so I'll click run.   And this is what I get. And looks like this BF-CMratio is of primary importance,   followed by some   other ones here of less importance, but they're...but they are significant, and the reason they are significant is, number one, the p value, by the way, this is a .05 level and the blue...   excuse me, the .01 level, and the blue line here, which is .01 level for the significance. And the reason...the reason we   we're doing it like this with the logworth...oh, by the way, I should mention that the logworth is 10 times minus the log and that's log base 10. So if we have the blue line equal to the value of 2, the log...   log of the p value of .02...or excuse me, .01 would be a 2, so that's why the line is in there.   Here, you don't really need it, but if you're doing an analysis as one of variables, it definitely comes in handy. And we look down here to summaries   of fit, find an R squared .5, and that's...that's about the most of what we can get out of that window right there.   And we come down here to estimate, and we can see the the estimates are coefficients for our multiple linear regressions.   So we see that these two may not be needed, so, in the interest of parsimony, we'll...   we'll select those, come over here and remove it. And now we have two less variables. We could probably remove chip level, but for right now, we'll just leave it there, because our R squared drop down a little bit.   And everything else remained pretty much the same. So that is our model here. We can come down here and open some other windows.   And we see T ratios of the individual terms, some effect tests, and that is where we will leave it for right now. So let me minimize that window, and we have one more analysis to do and that is time series. So we come up here to analyze again.   And we'll come down here to specialized modeling, and we'll go to time series. Click on that and everything is still selected.   We'll select that,   we'll put Y in there.   It will do the same thing.   One too many, hang on.   Put our target list here and we have this button here, X time, we'll put the hour sequence in again because it's easier read.   Put that in there.   Click the run.   And for right now, we'll ignore all that sort of thing, and we'll come down here and we'll look at our time series diagnostic. Now scroll down a little bit more. And the blue line here is the   P value, and this is at a .05 level, and this is looking at different autocorrelations and the partial lag times.   So I could spend the whole 45 minutes talking about this, but we don't have all that time, so I'll just give a brief summary. What we want to do is to bring all of these lag times   within that blue line. And it looks like, well, for this one, we have to go up to around maybe five or six, maybe even seven, somewhere around there.   And hit that point around three or four or five with the partial lag time.   So   what we do is come up here to the red triangle,   and we want to do an ARIMA model. And we get this window and ARIMA stands for auto regressive integrated moving average. Now we don't want to do differencing here   because we've already done some differencing. Those numbers which were appended to the end of the columns, they were the lag times, so that all the data lined up. So what we want to do is just make a guess with the number here.   Let me make that a 4, so lag 4 periods.   And we'll do the same down here for moving average.   And we could choose different values for those and let's see what the intercept, constrain fit.   We'll hit estimate and we come down here again.   And this is what we get. We're doing a little bit better than we did before.   Let's come down here.   And we'll select this again. In the interest of time I'll put in six periods.   Check everything. Hit the estimate again.   And I'll scroll down, and now that looks a little bit better. We probably want one to be within four or five and six lag periods here, so there's one more thing to check.   If we look at the graphs...   And we could save the scripts to the table. Nick, can we pause the recording for a second.   Okay, so I did that a couple of times.   And here I have summary here in the model comparison.   So we have the first model of the ARMA. Again we didn't do the difference so it's called ARMA not ARIMA. And here's five period, and let me expand this   just a little bit more, and this is the table. I worked on the table a little bit, I took out...took out some columns that I didn't want to discuss for reasons of time. And   what we can do is look at the results. So it shows us AIC and lower AIC is better, and it looks like the best model here is the   first one we did the (5,5). We have   1268 and then it goes up to 1270, 1271, 1271, and then we have the R squared and here, of course, larger is better, and it looks like the (6,6) lag was the best one there. They have a slightly larger R, but certainly not significant and   here we look at the MAPE, which is that mean average percentage error, and that, of course, we want to minimize. And looks like the middle model there, the second one, with the six period legs is the best one there and that gets a 7.55.   And then we have some more data down here. We have the estimates and you can look at various and sundry graphics down there. And all these have been saved to the data table. Let me clean this up a little bit.   And so what we have here, in summary, is the day-to-day operations of a very complicated plant. We were able to explain and understand.   We started off with the visual exploration of the data and then we went on to   some research to find out what we needed to find out about the process so we knew what was going on with our analysis, and we were able to save everything our data table.   And one one quick thing was that we explored through our data, we were able to move easily from one platform to another for our analysis and that concludes the presentation.
Michael Nazarkovsky, Chemistry Department, Pontifical Catholic University of Rio de Janeiro (DQ PUC Rio) Felipe Lopes de Oliveira, Institute of Chemistry, Federal University of Rio de Janeiro (IQ UFRJ)   Covalent organic frameworks (COFs) are an emerging class of crystalline organic nanoporous materials, designed in a bottom-up approach by the covalent bonding of one or more building blocks. Due to the substantial quantity and diversity of building blocks, there is a massive number of different COFs that can be synthesized. In this sense, the development of machine learning techniques that can guide the synthesis of new materials is extremely important. In this work, after a set of 590 structures was selected, DFT and Grand-Canonical Monte Carlo methods were used to determine the enthalpy of adsorption and capture of CO2 or H2 by these structures. Also attempts to develop the classification models were made. In the present study, the structures were classified by means of unsupervised (Hierarchical Clustering, PCA) and supervised (Multiple Logistic Regression, Naive Bayes, KNN, SVM) methods coded in JSL (JMP Pro 15). The COFs were separated in 2D and 3D. The largest clusters were selected to perform supervised machine learning stratifying the data by the structure (62.9% 2D and 37.1% 3D) to avoid disproportion in training/validation ratio (80%/20%). The 2D/3D classification by textural properties was successfully accomplished with the 100% accuracy (validation) for all models. Other metrics, such as Entropy R2, Generalized R2, Mean -log(p) were compared and discussed in order to select the optimal model.     Auto-generated transcript...   Speaker Transcript Josh Staunton Michael. Michael Nazarkovsky hi josh just a moment.   Okay. Michael Nazarkovsky Oh yeah no no it's working well. Josh Staunton Good I can see you, thank you. Michael Nazarkovsky While you.   While we're waiting for phillippi he says he's the first, so I just talked to him 10 minutes ago, so you're recording right now yeah. Josh Staunton yeah it starts recording we'll just edit this part out.   Once we kind of go through it and have a couple questions to kind of go through once Philippe joins and then I will pass it over to you shut off my camera and my my son and let you let you guys take it away, but thank you for for doing this we appreciate it. Michael Nazarkovsky Oh, thank you very much, really, it was a big surprise for me when they received the such a message about possibility opportunity to participate.   In this event or really it's three to five or actually how I knew about that, last year I got the session in October from last year, and it was impressed oh my God so good topics are elucidated so I decided not to lose this opportunity this year. Josh Staunton yeah great awesome.   Well, you know, unfortunately, you know this year will be in person, but you know, we still have some great content and.   We, hopefully, in the future, we can get you to one of the in person discovery events which are always fun. Michael Nazarkovsky Oh yeah hopefully as.   I started using jam for from beginning of last year, and just from the start of pajamas.   yeah yeah so pretty quick very simply, I would say.   yeah and do the pandemic or I succeed succeed a lot actually as or as a result of one of the results, you will see right now in this presentation.   And even started publishing about that and actually at the moment, I guess, I am a co author of author also have maybe three articles in parallel, well aware, I involve JMP.   Come on. Josh Staunton Everything that's pretty.   Fast learning that's for you. Michael Nazarkovsky Just you know it helps a lot of course for a statistical thinking.   yeah just yeah.   yeah. Josh Staunton we're using before using just kind of open source to do your analyses in the past or. Michael Nazarkovsky Open Source you mean. Josh Staunton On or something.   To reduce it to someone else's. Michael Nazarkovsky Ah, no, no, no, no, no, you know I started learning Python but you know what.   I mean countering some difficulties in learning.   So I'm chemist and physical chemist and actually using all this stuff for for Chemistry material science, as you will see right now yeah so I decided to switch from Python to math lab in parallel to pro six road to have make progress there because.   machine learning interface for metalab in Jeff is more than enough for scientists. Josh Staunton Yes, yeah no that's that's great to hear I'm glad you're you're you've embraced it and are you know excelling with it that's awesome. Michael Nazarkovsky Well, let me, let me please text to phillippi.   Okay whoa what's going what's going on, he has encountered some troubles with Internet issues like one hour before.   Okay okay.   he's trying to enter. Josh Staunton he's not in yet no.   strange. Josh Staunton in real time.   So you didn't have to put any password us clicked on the link and just came straight in correct. Michael Nazarkovsky Well, I sent him the link just doing but I don't remember if any password needs to be there. Josh Staunton You shouldn't have business yeah. Michael Nazarkovsky that's the thing so he's texting something to me right now in just a moment.   Oh.   Oh yeah so the thing is not the zoom is again encountered and troubles with the Internet right now.   Oh. Michael Nazarkovsky we've got one hour yeah. Josh Staunton he's joining right now. Michael Nazarkovsky Oh good.   Yes. Felipe Lopes Hello. Michael Nazarkovsky Hello finally you're a hero, so you you're launching the presentation so josh just as a way josh is awaiting us so we're in time actually. Felipe Lopes Yes, exactly anytime yeah. Josh Staunton Definitely, a time, I mean the the be.   Nice to meet you my name is.   josh I mean I don't.   See the recorder so I'll just I'm just going to go through a few questions and then, once we kind of get through that I'm going to pass it over to you shut off my camera and you guys can take it away we're going to edit out all this part.   Okay.   Even though it's worth recording right now but we'll edit that part out later.   So a couple things.   Right, so I guess.   So obviously we have Michael as our kosky from Pontifical Catholic University of Rio de Janeiro, and he Lopez de la vieja.   federal University of Rio de Janeiro, and you guys are going to be going over.   Water approaches and classification of kirkland or getting frameworks by textural properties using jail cell 2021 dash us desk 30 MP dash 874 does that sound correct. Felipe Lopes Yes. Michael Nazarkovsky Okay, the whole name the full name. Josh Staunton Yes, the full name okay now you guys.   Just you understand this is being recorded for using JMP discovery Summit Conference, it will be available publicly and the JMP user community do you give permission for this recording and use. Felipe Lopes Yes, I do, yes, thank you. Michael Nazarkovsky Okay, good. Josh Staunton Michael you good.   yeah, of course.   yeah kind of analogy here.   sounds like you guys have do you have.   Your phones kind of turned off and like I am or anything like that that could. Michael Nazarkovsky Destroy yeah.   Anything which can disturb I will just switch off older oldest sounds and skyway remember these requirements up and George please tell me during the session I mean after the record of the session, you will catch the relevant stuff or it will be like the whole. Josh Staunton Yes, so it will go straight through I don't think they're going to be doing any editing in during it if, for some reason, there was something.   catastrophic that happened we can start from the beginning again, but I think they will just let you go straight through don't worry about you know hey we're all humans here right so.   You made a mistake just try to just keep rolling with it nobody's it's not doesn't have to be perfect right stats isn't perfect so you don't need to either, but yeah.   Go ahead. Felipe Lopes No. Josh Staunton I'm sorry.   Is that okay Michael does that sound good. Michael Nazarkovsky yeah everything's fine everything's fine.   So yeah. Josh Staunton last thing we just see.   Okay.   yep already good to that.   All right, just taskbar.   Now I'm going to.   Do you guys have any other questions or you guys pretty much ready to to present and.   Is. Michael Nazarkovsky It yes.   yeah I guess. Felipe Lopes share my screen Michael you share your screen how. Michael Nazarkovsky Are you brother.   But okay josh technically it's fine if we share our screens after each other yeah. Josh Staunton yeah ya know that that is fine yep just.   Just make sure you're obviously you know forwarding the slides and is not getting too far behind the slides and. Michael Nazarkovsky yeah but. Josh Staunton yeah absolutely you guys know what. Michael Nazarkovsky We will do they have the entire the same presentation is all our parts are together so just we do it after each other that's fine Thank you. Josh Staunton Okay cool and you guys know how to share screens use zoom before.   Yes. Felipe Lopes Just setting up the rushing here.   put this.   Yes. Josh Staunton Like your backgrounds. Felipe Lopes My sharing the right screen right now. Michael Nazarkovsky Great everything's fine. Felipe Lopes yeah.   it's the presentation right.   Because I'm heading to screen and sometimes the wrong screen and throws up so. Michael Nazarkovsky Perfect everything's perfect. Josh Staunton That looks great I'm going to turn off my.   My camera now but I'll pass it over you gentlemen, thanks again and.   I guess, we can talk at the end if they. Michael Nazarkovsky would have been nice to see you again josh. Josh Staunton All right, take care. Felipe Lopes So, hello everyone. My name is Felipe Lopes de Oliveira. I'm from Federal University of Rio de Janeiro and I'm presenting this work with Michael   Nazarkovsky, from Pontifical Catholic University also in Rio de Janeiro. And today we present the work, Modern Approaches in Classification of Covalent Organic Frameworks by Textural and...by Textural Properties Using the JSL Software.   So covalent organic frameworks are a new class of materials composed only by light elements such as carbon, nitrogen, oxygen, hydrogen, and boron.   They are organic, crystalline, nanoporous, highly stable in reticular materials. This means that the chemical bonds that form these materials follow a well-defined geometric pattern dictated by the topology of the network.   In this image, I'm showing an example of an HCB type lattice, which is composed of triangles and rods connected periodically,   forming an extended network. These triangles and rod represent molecules which can be...which you call building blocks that are covalently bonded and form an extended structure   with the geometric characteristics of this nature. There are many different types of   geometries of molecules that can be used as building blocks and several types of chemical bonds that we use to connect to them.   In this way, it's possible to form a structure with extremely specific characteristics, such as ??? that we can control in the order of ??? in the physical, chemical characteristics of the interior of the   ??? that can...that we have defined by the building blocks. So we can think of the building blocks as Lego pieces, where different pieces can be used to build several structures with unique characteristics.   In this slide I show different possible topologies that can be used to build both two-dimensional and three-dimensional covalent organic frameworks,   as well as the format that each building blocks must have to use to build these networks. On the right, I show an example of a crystal structure model of COFs   with a ??? network, which will be used to calculate the properties of interest in the future. It's possible to see that the format forms a structure that is composed of a two-dimensional covalent layer   and these layers are stacked in the equivalent of a geographies the third dimension.   Due to their incredible modulation capacities, COFs can be used for several extremely interesting applications, such as heterogeneous catalysis, energy production, storage,   organic semiconductors, chemical sensing, thermal insulation, and gas capture and storage, among others. However, due to the larger modularity that...   this larger modularity can also be a problem, but the development of these materials is really complex.   They use...they are usually developed through a combination of chemical intuition and available molecules.   With several steps of synthesis and characterization that repeated with materials with high ??? is obtained. This material is developed...   This material's development is very time-consuming, expensive and requires several...20 professionals to perform it. And sometimes, all these efforts do not lead to...   do not lead to a material with a desired characteristics. So to try to solve this problem, we...   to try to solve this problem, we propose a simple and ??? computational approach to study these materials and thus try to accelerate their development.   We initially use a database of structures, now as curated COFs, which has compiled of several structures synthesized and presented in the literature.   Then we use some building routines to develop values to validate those structures and make sure that everything's okay. Next we use a DFT-based   electronic structure calculation software called CP2K to optimize the self(?) parameters in the atomic positions of the structures. Then we use a software called ZEO++,   which performs geometrical analysis of the pores and percentage match between the formation of ??? networks inside them.   And then determines a set of structural descriptors for the selected costs. This is a script by my colleague, Michael in JMP software to do several extremely interesting analyses.   Well, here I'm showing the main structures that we use. First we have a structure ID, which is basically I think classification of the name of the structure.   And following, we have the LIS, which stands for large included sphere, which gives us the information of the pore diameter of the material.   We have the LFS, large free sphere, and LSFP, large sphere free path, with...which basically is the same thing as the large included sphere, but can encode the information...   the difference in the pore diameter influence of several functionalizations within the pores. We also have the SSA and the SSAV, which is the specific area, both in square meters per gram in square meters per volume unit.   AVF, which is the accessible volume fraction, which is how much of the material is formed by pores and Vp is the pore volume.   And then, we have the N(chan), which is the number of accessible channels.   Here in the figure I'm showing a structure with only one channel, but three-dimensional materials can have more than one accessible channel.   We have ChanDim, which is the dimensionality of the channels that can be 1D, like in this structure, 2D and 3D. And the channel size, which is basically sometimes the pore diameter, but for ??? structures can be a little bit different.   And now, Michael will talk about a little bit of the results. Michael Nazarkovsky Yeah, thanks a lot. Thank you very much, Felipe, for such a brilliant introduction, so now it will be pure data analysis and data science. So let me please share my screen in my turn okay. Okay, this is what I'm going to do right now.   Perfect. Good stuff.   Okay, so let's start from from the beginning of the second part. The time has come to analyze the data of those...of these...such a huge, I would say, amount of structures obtained by Felipe.   Just a moment. Okay, good. After the first, let's say, from before any analysis, they have to retrieve the data and exclude some erroneous structures or some irrelevant   and screen to detect missing values. No missing values have been detected during their analysis and screening. We also have discovered after the   data visualization that the absolute majority of the structures available are assigned to to the structure more 85% and 15% all into a 3D structure. So from the beginning, we have unbalanced data set now to be predicted. Absolute majority of the   channels is assigned to monodimensional   channels and there are just more...more than 80% and almost 17% is attributed to 3D dimensional and just less than 2% is attributed to the channels that are by bidimensional.   Moreover, all these channels are quite isotropic so it means that in all directions (X, Y and Z) they are equal, which is reflected on data three dimensional 3D diagram.   If we distribute all the data between the structures, I mean 2 and 3D structures, we could see the monodimensional, bidimensional take the absolute majority. More than 90% for that 2D structure and the 3D are is taking.   3D, I mean 3D, three dimensional channels. Almost more than 60% are taken by 3D structures, however, quite enough...quite a high amount of   of three also is taken by 2D structures, because just they have a much more...a much higher value like 85%.   To avoid that, to avoid such an unbalanced   situation, we are undertaking some unsupervised   approaches, together with supervised methods of the machine learning   after the data preparation. So what we do in the first first ??? of these analysis, multivariate analysis to detect colinearity   and correlations between the parameters. So, as we can see that first three (LIS, LFS, and LSFP) are highly correlated so one can be substituted by others. However, such a characteristic will be very important, while the analysis of PCA   (principal component analysis). This is why, for hierarchical clustering, we selected just only one parameters of them.   In case also X, Y and Z for a channel size, our isotropic means the same. However   a decision was taken to keep all of them, because this is highly related to the native structure of the 2D...of the 2D compound...compound structures of key COFs.   Well, due to their high amount, almost 500 structures, we decided to select to 13 clusters, which can be more significant and relevant optimal by ratio or the portion taken from the...all number...entire number of clusters and analyze them   for the mean   parameters, which are reflected on the table.   As we can see that, after a number 13, there is a another huge   reduction of the clustering... cluster distances, so this is why 13, it was assumed to be the optimal amount of the clusters. And from them we selected them most massive cluster, cluster #7, which has contained...comprises 128   samples of 2D structure, so it will be used for classification afterward.   In turn, 3D structures are, how do I would...I would dare to say that already are the same. We have still observed still a high correlation between LIS, LFS and LSFP, so this is why also   the conditions for cluster extraction were the same like for 2D. However with lower amount of clusters, like we decided to take seven. However from these seven clusters, we extracted four, just to keep at least   the ratio between 2D and 3D closer to equality. And it will be not equal,   however, as you will see on the next slide, it will be much better, less unbalanced than 85 to 15.   From comparing two tables about variability variation of the parameters, we can see that the last four in both cases are responsible for higher variability of the structures within each set. Volume fracture...fraction...sorry,   total pore volume and both types of specific surface area, so they are on the same level and they are responsible for higher variability of by among the structures. So as I said, we got less...   less difference between two structures in the end.   Validation...trained validation played two sets were built and stratified on the basis of the obtained structures from...taken from the most massive clusters and well distributed.   And let's get started with logistic regression, as the most simple parametric...parametric analyses. After taking out the most irrelevant   variables, we left only two total pore volume and volume fraction.   They are highly correlated also, like   a very huge percentage of correlations are. However, it would take one about, as you can see, sorry, the same LogWorth, so it means that we cannot exclude,   for example, volume fraction...fraction because I did it once to test how it will work and the lack of it became significant and   misclassification rate in validation got 20%. And as compared to here, we got on the validation, zero misclassification for full right...correct attribution to each group   and with low...pretty low misclassification in the training set. The rest of the metrics...measures of the model performance will be more discussed in the end, so when we will compare all the performance of all the models. Also their respective   equations to predict the probability, predicted the structure was created and given, was obtained and given...presented on this slide. So let's go further. This is why, in the next   modeling, we used only these two parameters...parameters to predict both structures (2D or 3D), so kicking nearest neighbors as a non parametric model has given them the best   metrics...best performance and validation at 1K, so only one neighbor is enough just to recognize to assigned...to recognize the structure. As we can see from the diagram of...   I mean, which a total pore volume and volume fraction, sorry, so we can see that   for 2D we have more concentrated amount of this structure. It's   very   concentrated in one spot. However, in contrast to 3D, they are quite spread. So this is why it would compare the distances. Here in the right part of the slide, you can see that the highest distances between the neighbors are related to   3D structures as in validation, as in training set. So this is why we can use also these non parametric model...a single non parametric model to apply in the attribution using just these two parameters.   Also quadratic discriminant analysis gives zero misclassification in the validation and less than 5% of misclassification in the training set, just misclassifying eight samples. So it's even easy...is seen as on the confusion matrix and also on the parallel plot diagram.   And also the rest of the...   rest of the parameters will be described...will be discussed later because these model has some tricks...it will be...   more interesting...will be...we will see in the end. So Naive Bayes is another parametric but very simple model, has given also zero misclassification. So the situation is becoming quite competitive...competitive.   As we can see about here also both parameters are almost equal by their effects so   the total effect is almost 0.88 for total pore volume and for volume fraction is almost eight...0.8, so this is why they are almost equal in contribution for their classification task.   And simple also plots were built for prediction.   Support vector machine has not been demonstrated, also brilliant performance. As we can see that of 64...sorry 46 ??? vectors were used to build such a model with zero misclassification in the end. And some   misclassification in a training (6%) and mostly, the biggest percentage of misclassification was for for exactly 3D, they were they were misclassified.   Okay, so let's go further. And finally, we're coming to a complex model which combines unsupervised and supervised approaches, unsupervised PCA. In this case, we took all all all them...   quantitative continuous parameters, including all, even those which are highly correlated with each other, and we built   diagrams and plots for PCA. And as we can see, two main components which has the values by   more than one by Kaiser rule, so it means that at this point, we have another almost close to one component, third component, we will not involve it in the prediction model. We will involve only the first two   where those Eigenvalues were more than one. And what is interesting, here we can see that in the first component, the principal component which can describe almost 66% of the cases, these comprises,   let's say, LIS, LFS, LSFP and channels...channels parameters, so I mean size X, Y and Z, those directions. So this is like, let's say, ??? channels, ??? channel set.   And the second, the second component is mostly good, could be described by both types of specific surface area and porocity. This is a pure textural parameters.   So, as we can see that even here, we are observing such a certification difference and can make conclusions about the contribution of both parmeters into the prediction, in this description of the.   structures. So after we build logistic regression, using only these two principal components, their formula will be given in the...in our publication, on our abstract afterward in the paper where, which we're going to submit to within the range...within the scope of the present summit.   Here also, obviously, we got zero misclassification rate and very, very low, less than 1% misclassification in training...while training, which is not...which is...was not...which was not observed for the previous models. So now we are coming to the most   interesting part, the conclusion about our   summarized   models. So as we can see, we summarize by misclassification the training set, obviously because we got some values there. We can see the logistic regression...the regression models   like simple and PCA based, here on the top of the chart by misclassification and by validation discriminant analysis. However,   what we can say here, things we have all all their misclassification arrays or accuracy as a opposite of parameter, we should compare other metrics.   So what we can see that, okay good, we have also by other metrics, higher values so advantageous of discriminant analysis. But if we would take a closer look,   we will see...everyone will see that there is a huge difference between these metrics, even between the misclassification rate, between validation and training. So we can make   some thing, you know that we we we encountered some troubles with overfitting or underfitting. In this case, we mostly deal with underfitting since metrics invalidation are better than   for training. Will normally occur opposite, yeah, most...most of the cases deal with the overfitting when the training is overfitting the data and we obtain metrics worse in validation. Here is completely opposite, so this is why   these metrics, like entropy R square and generalize R square and mean log P, and also misclassification rate,   were analyzed...differences were analyzed for these four metrics, and this is why we're selecting the best of the best. As mentioned before, discriminant analysis demonstrated the highest   quality of the performance, however, it has one of the hugest differences between the metrics of validation in training, as compared to other ones. Remember that   on validation, all of them gave zero, so it means we can select almost   every of them, with the optimal balance of other metrics...differences for the metrics between validation and training. So what we can see here, for example, the lowest...   lowest difference between misclassification rate and entropy R squared is attributed to logistic regression, combined with PCA.   Other two metrics, like generalize difference between...difference in generalized R square and mean log P is observed   in for support vector machine. For Naive Bayes, we cannot select this model, however, because at least in two   parameters, it has a very huge difference. So this is why   we can assume that for the best performance, for the most correct, logistic regression based on PCA is recommended, and also optionally recommended, support vector machine as another model to predict 2D and 3D structures. More details and more   methods will be given in our paper, which we are going to submit within two weeks   and publish on the website of JMP Community. So thank you very much, it was a big pleasure to give such a presentation. So I highly acknowledged also my colleague David Kirmayer from the Hebrew University of Jerusalem in Israel. Thank you for attention. Josh Staunton alright.   Thank you, gentlemen.
Charles Chen, Master Black Belt Q&R, Applied Materials Mason Chen, Black Belt Student, Stanford OHS   Traditional Six Sigma curricula that include DMAIC, DFSS and Lean were not developed specifically and effectively for today’s AI data scientists. This paper demonstrates an innovative Six Sigma training curriculum for data scientists, using several objectives: Adopt modern JMP Text Explorer and datamMining techniques for root cause analysis and problem solving. Integrate various JMP platforms holistically to analyze pattern recognition or discover the insights. Enhance predictive modeling capability through neural, partition, and principal component analysis. Utilize modern quality and process platforms such as goal plot model driven multivariate SPC. Map these modern JMP platforms into the Six Sigma DMAIC/DFSS/Lean framework for Six Sigma Project execution.   This innovative Six Sigma curriculum is applicable to industrial professionals, both for managers and individual contributors who make critical decisions in big data AI business. This curriculum is not just for data scientists; it is also a powerful tool for design and process engineers, quality and reliability engineers, supply chain engineers, business analysts, statisticians, and marketers who want to become reliable decision makers and true project leaders.       Auto-generated transcript...   Speaker Transcript Jason Wiggins, JMP Okay, go ahead when you're ready.   Okay.   Hello everybody.   My name is Charles Chen. Today we are going to talk about an interesting topic regarding the STEAMS and the DMAIC curriculum for data scientists using JMP 16.   We're three authors. Mason Chen is the first one. And I'm Charles, the second one, and Patrick is the third one. We are from the Stanford Online High School STEAMS Club. Mason Chen is a high school junior and is the club leader. And I and Patrick are the advisors for the club.   The project overview, the opportunity statement, the traditional Six Sigma DMAIC process, combined with the interdisciplinary STEAMS methodology   can help data scientists make a greater contribution in the field of big data. And our project objective try to develop a Six Sigma data science training curriculum for high school schoolers   all the way to the industry professionals by mapping the JMP 16 platform onto the DMAIC phases.   Before we start, the audience may be curious, hey, the Six Sigma is for the professional, and the data scientist is also professional.   So how come we mention this also for the high schoolers? So here we want to use the case study for our first author, Mason Chen.   His many experience first start when he was 10 years old. He started to receive the data scientist training and also try to certify the IBM SPSS and Minitab.   When he was 11 years old, he was IBM Modeler Data Mining certified. Then he moved to the JMP, studying about 12 or 13 years old.   And then JMP STIPS and the DOE and certification exam and also the JMP 16 text mining. And the reason that this year, he also now finished the linear algebra   college courses and also data science through our program. So today, we are going to talk about based on his experience, also the advisors, we try to introduce, how can we use JMP 16 to   create the Six Sigma data science program. And also Mason is moving to the R in the future. He tried to learn R, so he can also learn the JSL language.   And that's Mason's picture, when he was attending the ASA conference two years ago.   So our project tries to connect the STEAMS and the DMAIC. So STEAMS is the Mason founder STEAMs program about 2017 from the   traditional STEM program. So what's the difference? So the science, technology, engineering, mathematics still the same, understand?   But we added artificial intelligence and also statistics, [so it] becomes STEAMS. And the data science is a popular combo of artificial intelligence, mathematics and statistics.   And the traditional Six Sigma DMAIC map fairly well to the STEAMS methodology, fairly well. You can see a lot of close thinking. (??) Some other problem solving tools   from the DMAIC. They also map fairly well to the science field, or even engineering, problem solving. With the data science, they   may even promote more from STEAMS to the DMAIC. So that is how we think we can develop the Six Sigma data science program for the high schoolers based on the STEAMS methodology.   So the first step, once the JMP 16 released earlier this spring, that we look at all the JMP 16 platform and identify   what are the platforms that could be good for the data scientist. So we based on the big data the three Vs - volume, variety also velocity.   And, based on the three Vs, which are the tools or the platforms that may fit well to the data science program, so such as a Graph Builder,   Text Explorer, tables, predictive modeling, screening, multivariate methods, clustering, quality method, problem solving. Now we are going to map all these tools to the DMAIC phases.   Also through learning the JMP 16 platforms, we also try to understand also [inaudible] behind these JMP tools.   And we find out the data science statistics may cover the following modules. The first one, the traditional DMAIC for the quality and reliability engineering, such as MSA or SPC.   And the next will be the Design for Six Sigma, most for the modeling, DOE modeling, the Monte Carlo simulation, robust tolerance.   Then we find out linear algebra is very important for the data science, such as principal component analysis or the singular value decomposition. And next one will be the data mining, including the classification, neural network, partition or random forest.   And time series or forecasting is also very important for the data science to handle the time (?) data, so we find out that such as the ARIMA model.   And text mining also is good for the data not structured. And survey and the consumer research also very important to understand because of the voice of the customers.   Also, how to do the marketing segmentation. So all these statistics we found out all are critical to know in order to utilize a JMP platform.   Then we try to map   all these JMP 16 platforms to the traditional Lean Six Sigma BB modules. So we identify the traditional BB modules, and we develop three   training programs. The program A is the more traditional DMAIC Black Belt training program, and these are the JMP 16 platforms associated with. The second one is the more modern data mining   [inaudible] categorical data. So we try to speed (?) the JMP 16 platforms to each different training program. And because the students or the training, they may have a particular interest, so we can customize   their particular interest and field.   From now on, we'll try to map all the tools or the concepts to the DMAIC phases. The first phase is the Define phase. The main concept for the Define phase is to try to define the problem statement, including the voice of the customers or the voice of the business.   And also, we need to define the project goal and objective, especially about the Critical to the Quality CTQ, in Six Sigma language.   They also need to define the success criteria and any spec limit.   Also in Define phase, team building is very important for forming, storming, norming and performing.   And the associated JMP 16 platforms, like build database. Query Builder is very powerful. Data visualization is important.   Data mining try to cluster the customers. Market research, like consumer research, try to do the marketing segmentation; it is very important. On the right hand side is the simple example about how we do the marketing segmentation using the clustering.   Clustering affinity analysis. We probably can group the customers and then try to set up a strategy about what's the marketing priority.   The second phase is the Measure phase. The main focus is on process capability and the process stability, especially for the larger-scale manufacturing production.   We try to find three   powerful tool from JMP. The first one is a goal plot. So goal plot can plot the lot-to-lot process capability and into the two dimensions.   Through the different colored zones, we know that [inaudible] and it's not suitable.   The middle process performance plot is also very powerful because they combine the process capability and stability. If it can pass both criteria, you'll be on the upper left, so that means in green. If you fail both criteria, you'll be in the lower left or lower right.   On the right hand side is a process history explorer so they can list all the historical   past performance. And through this kind of list, you may identify what kind of factors associated with the poor yield.   The next phase is Analyze phase. In Analyze phase, root cause analysis, summarize complex data sets, visualize and discover patterns and insights, isolate and screen for important factors.   Based on these Analyze subjects, we also identify the fishbone diagram, Tabulate, Text Explorer, multivariate based methods, clustering, also the categorical response analysis.   For this slide, we want to introduce the Pareto plot. The JMP 16 Pareto plot, they can do the two-dimensional, so they can find more combination (??) or the pattern recognition among different kind of factors.   And the fishbone diagram was always a very powerful and useful for the real time, work on section, you can.   design or customize the fishbone diagram.   For the data summarization, Tabulate is the one we highly recommend. It is like   the Excel pivot table, [inaudible] convenient and powerful   because they can also add descriptive statistics.   In the middle, the Text Explorer is so powerful for analyzing the text database, so they can search the keyword and the phrases, then we can even convert the text mining to the data mining.   On the right hand side is for categorical data to find associations between different kinds of factors.   And the different colors, they present different kind of variables, so for any pair they are close to each other and they are far away from the origin and that's the focus about a high association; that may tell you something or some kind of insight among these variables.   The next phase is the Improve phase.   For the Improve phase, we try to build predictive models. We try to design new experiments as needed. We try to improve the production quality.   Regarding the JMP 16 platforms for predictive modeling, we highly recommend the Prediction Profiler or custom profiler. And DOE, we can   have different kinds of custom DOE, mixture DOE, or even the DOE augmentation. For the specialized models, like machine learning, right, we have the JMP neural network, partition model, also different kinds of screening.   For the survey and consumer research,   JMP has the choice model, also the maximize difference design model, too.   For the design optimization,   On the left hand side, the Prediction Profiler, we can do the sensitivity analysis. We can do the Monte Carlo simulation so we can simulate the non-conforming [inaudible]   percentage.   In the middle, if we have many factors in the model, the Custom Profiler will be the good choice to find the optimal model amongst so many factors.   On the right hand side, for group orthogonal supersaturated DOE. This is the very powerful DOE to analyze for an option to the downstream. And they use a blocking factors or concept in order to minimize the number of the DOE runs, to reduce the cost.   For the predictive models, we will recommend a neural network, a very powerful   transformation and try to find the best model to make it   get a higher training and validation   fitness. For the partition, this is a binary split. So they can split the data set into different kinds of categories, so they can   highlight also conclude (??) the major contribution.   For the design optimization, we also recommend some kind of screening platform available in JMP 16. The first one on the right hand side is response screening.   JMP uses the   FDR, false detection rate, to determine how good is your prediction modeling. So if your curve keeps lower, okay, and   and that means you have good prediction modeling. When you're going up, and that means this portion, you don't have good prediction capability.   For the middle one, it is process screening. So you can see all the process parameters. It could be input parameter. It could be output deliverables.   And JMP uses the green color and the red color to identify the up shift or down shift. So if you see more green or red, that means your policy is not very stable.   For the right hand side, also very powerful. It ranks all the predictor contribution, so they'll give you first the top few, or what we call the vital few, the predictors. So you can find root causes or find the solutions.   For the consumer research, JMP had choice design   to help the consumer, how to pick their best product or the choice.   For the maximize difference design, and this is also the other survey design. For the survey, you pick the most also the least preferred items. So JMP can run the model to rank your preference.   For the last Control phase, [inaudible] scale up process control, sustain improvement over long period, upstream to downstream, multivariate process control.   And then the JMP platforms will be classical control chart, time sensitive control chart, multivariate control chart. Consumer research will be the multiple factor analysis. Also the time series analysis, like decomposition, smoothing and ARIMA model and forecast.   For the multivariate control charts on the left-hand side, we can find change point detection. So this is the point that will give you the biggest contrast before and after. So these are [inaudible] the root cause analysis about what happened or when is the most,   biggest change point. In the middle, is the T Square.   Model driven control chart. So this will give you the the failure mode,   decomposition about what are the parameters that contribute most to the OOC point. On the right hand side is a multiple factor analysis.   Based on the eigenvalue, eigen factor analysis. So this is like a affinity diagram. So they can group the similar factors together, based on the eigenvalue, eigen factor.   For the time series analysis on the left-hand side, they can do the model diagnostics. So they can identify the trend seasonal and cyclical components.   In the middle, they can use the ARIMA models to fit the data using the seasonal or non-seasonal models. For the forecasting, they can find the optimal model to predict the future point.   So the takeaway from this talk...Traditional Six Sigma DMAIC and interdisciplinary STEAMS method can help develop data scientist on leadership and team building.   Modern JMP 16 platforms are mapped to the DMAIC phases to help deploy Six Sigma projects in data science fields. Database management, applied engineering, statistics, data mining and text mining are all critical to today's data scientific analytics.   This will conclude our presentation today, and thank you very much for your time.
A pitcher in Major League Baseball relies on a combination of strategy, deception, skill, and execution to be successful.  Their sole job is to get an out for each batter they face.   To do so, they analyze the situation for each pitch and ask themselves questions like, "How many balls and strikes are there?  Are any runners on base?  What’s the score?”  They also consider decisions like “What pitch should I throw and where?  How fast should I throw it?”   Based on the answers to these questions, the pitcher will carefully decide which approach he thinks has the best chance of sending each batter back to the dugout.  The release point of the ball, spin rate, and breaking amount all contribute towards physically executing each pitch to the best of his ability.   Most pitchers are likely aware of their biggest strengths, but do they have any hidden strengths that aren’t being used to their full potential?  How does a pitcher’s actual success and potential success stack up with others?   These questions are answered using JMP 16 Pro’s new enhanced log and model screening features, JMP’s R functions to access Bill Petti’s “baseballr” package from Baseball Savant at MLB.com, and more.   Important Note -In the abstract (and at 1:28 in the video), I mention using JMP's R functions to access the "baseballr" package. This data was originally accessed a couple of years ago under different versions of JMP and R, and unfortunately they are not currently compatible. I have included two options below for accessing the data used in the presentation. Option 1: Copy the contents of "baseballr_2018_season_accumulation_script.txt" to an R script and run it in R Studio. This option is longer than Option 2, but it shows you how the data is pulled in R via the "baseballr" package Must have R software and R Studio installed - https://cran.r-project.org and https://www.rstudio.com/products/rstudio/download The R script will take several minutes to run due to data size limitations per pull and will save "MLB 2018 Regular Season.csv" to your current working directory in R Studio Option 2: Download "MLB 2018 Regular Season.csv" from the following link: https://gtvault-my.sharepoint.com/personal/rcooper60_gatech_edu/Documents/JMP%20Discovery%20Summit%202021/MLB%202018%20Regular%20Season.csv    To Run the Enhanced Log Script -Open "MLB 2018 Regular Season.csv" in JMP 16 Pro (this will take a few minutes) -Run "enhanced_log_script.jsl" to perform the data manipulation and model screening actions explained in the presentation  
JMP 16 made us all coders. With Action Recording in the Enhanced Log, point-and-click data work can now be saved as a replayable script. Learn some intermediate JSL techniques to future-proof your captured code, making it more general, more robust, and easier to use. After a brief review of basic JSL syntax, we discuss case studies illustrating three techniques: Using loops to perform actions on many columns at once. Extracting numeric values from a JMP report, for reuse later. Allowing users to specify data files or columns at run time.     Auto-generated transcript...   Speaker Transcript Jordan Hiller Hi, everybody. I'm Jordan Hiller and this talk is Steal This Code! Three Upgrades for Scripts Obtained from the Enhanced Log.   And here's the inspiration for this talk. It's a book by Abbie Hoffman, a provocateur and activist. Yeah, he named his book, Steal this Book, which is a great title. Not sure it's a great business plan to name your book that, but anyway it's it's very catchy.   So why steal this code? JMP wants you to steal code. The idea is that you point and click and do your analysis in JMP,   and that code behind your point and click is saved. It's saved in the enhanced log and you can grab it and reuse it.   And that's about 90% of your production script. You're going to grab that and reuse it so that you can   automate common tasks and do it with a click of one button, instead of a sequence of point and clicks. So that's the whole goal here, you're going to save   script from the enhanced log and that's the the bones, the skeleton of your production script.   There's about 10% left over that you need to write yourself, and that's our goal for this talk. We're going to talk about some of that connective tissue that holds production scripts together.   So in this talk, I'm going to focus with you, you know, not so much on the on the nitty gritty of the syntax,   much more on the big picture, and that's in line with what I think is a good way to learn JSL programming. It's much better to take existing code and learn by example. Use that code.   Find that snippet that does what you want to do. You're going to find it on the JMP user Community or maybe in the scripting guide.   And just copy it and adapt it for your use. You don't have to know all the complexities of the syntax. You'll get really far that way. Learning the syntax comes along as you gain experience.   So we're going to talk about three case studies about things you can do to enhance your code from the enhanced log, but first let's do a review of JSL. Maybe it's an introduction to some JSL basics for some of you.   So the important thing for you to know about JSL if you're starting out is that everything is an object. Everything that you   want to do something to is an object, so that includes data tables and columns in data tables, and platforms, like the distribution platform, graph builder, Fit Y by X,   and the reports that come from those platforms. So when you have objects, the way that you operate on them, is you send them messages. So objects receive messages. Here's a really simple example that I copied from the enhanced log.   Open the Big Class data table in the sample data directory and run a simple distribution analysis, right. And this is what that code looks like, just copied from the log.   Now.   What we want to do, the first thing we want to do is name the data table. Give it a name, and for that I took the code from line five over here and I enhanced it. I   added this bctable= before the rest of it, before the open part of the statement, all right. And what that does is, as we open that data table, we create this name, bctable. You know, I chose bctable, you'll see dt in the documentation. Use what you want, this is assigned by you.   So let's run this by highlighting it and clicking the run button, opening Big Class and simultaneously assigning the name bctable. Now the virtue of doing that, why is it important, it's because now I have this   handle. It's a hook I can use in the code later, bctable. Whenever I say bctable I'm talking about Big Class.   So   here's the next line, but distribution command, and I have modified that too. And you know, before here from the log that's a naked distribution command. It doesn't have...   it's not pointing at a data table. When you run it this way, JMP will just run it on the data table that has focus,   the current data table, that happens to be big class, because you opened it just before that, okay. But it's always good to be explicit. It's a good idea to explicitly say which data table you want to operate on, so we'll run it this way.   bctable, and then the double arrows and then the distribution commands. So that's what we've added, bctable with the double arrows.   And this is the syntax for sending a message. So we're sending the distribution message on the right to the data table, that bctable is named for, on the left. And you can see that that's Big Class when I hover over bctable. So running the whole thing,   we get the distribution report   that we wanted.   Now here's...we can take this a step further and, often, we need to do this. We're going to create a name for the platform as we run it, so   here's a line that incorporates both the double arrow and the equals, alright. So so let's uh...let's read this from the right to the left. We're going to take that distribution message and send it to the bctable.   And the result of all that, which is the platform, we're going to save in this variable I'm calling distplat.   Let's run that.   All right, looks the same, but the difference is now I have this this JSL variable distplat that I can use later on in my code.   What would you use it for? You might use it to add red triangle options to the output, so here let's let's bring that...   let's bring that distribution report where you can see it.   And we're going to take that distplat object, that's the...that's the distribution platform.   See, hovering over it, and let's run this line. Turn on the normal quintile plot in the distribution platform. There it is.   Okay.   Last step is to take the platform, to take this object here, and to   get a report from it, a report object. That's this line. Send the report message to the platform object, and now we get a report object.   And we need to do this if you want to access these values in your code -- any numbers or graph or graphical elements in the in the report -- you'll need this, okay. This object rdistplat. So just running it one more time, rdistplat.   I ran it and you don't see anything visibly, but what happened is now I have this object, rdistplat. It says it's a display box, and later on we're going to talk about how you can use that to access numbers here in the report.   Okay I'm going to clear my log   and start with our   case studies.   So the first technique that we're going to learn is repeating the same command over and over on different columns in the data table, iterating over columns with loops using the for each command. So   this is a really common need, sometimes it's just something simple that you need to do, like changing the format or the modeling type of a column.   It could be very complex like   taking a...taking a transform of the column and running a control chart and doing something with that control chart. So yeah, it scales; this technique scales.   Here's how we're going to achieve this in JSL code. We'll point and click first, right, do what we want to do. Do it for a single column, and then we're going to grab that code using the enhanced log,   and then that's the skeleton of our script. And we're going to modify it in order to loop over all the columns that we want to perform that action on.   The method we're going to use for for looping, for iterating, is is for each, and that was introduced in JMP 16, so this technique is not going to work in earlier versions of JMP.   This also uses a concept called JSL lists, so we're going to talk about that a little bit too.   Okay, I am going to start   in JMP, let me start at the home window.   And I'm going to...let me just check the log, make sure it's clean. No, I got to remove one thing, clear the log. Okay, so let's point and click what we want to do. I'm going to open a data table called sc20.   This data table is available to you in the presentation materials that you'll find online.   And looking at this data table, it has an ID column and then it has about 20   columns of data. These have many, many decimal places, too much precision and I want to truncate it at two decimal places. So let's change the format so we're only showing the first two decimal places of data.   That's a simple point and click operation. Right click, go into the column info,   and we'll change the display format from the default best to fix decimal with two places.   OK, and now we did that for NPN1, we have 19 other columns we'd like to do that for, as well.   So let's examine the enhanced log.   And this is it, this is the code that we need to achieve that changing the format for one column, NPN1, right. And then to reuse this, go to the red triangle, and you can save the script either to your clipboard or even to a new script window.   I'm going to switch to another file, where I've done some work and so we can dig into this a little bit more.   Okay.   So the first two lines here...three lines, this is the code from the log. All right, the first thing we're doing is we're opening the data table.   And you might have noticed that I've edited a little bit, I edited out my long path name, my long directory path to this file.   If you run it this way,   it will work as long as the JMP data table, sc20, is in the same directory as this JSL file. So that's just like a local file path to this same directory that you're currently in.   So anyway, from the log we're just opening the data table, and then we are here, applying that format change, fixed decimal 10 characters two decimals.   The table...the column reference over here on the left is one way to address a column. There are several different ways, but this is a very common one you'll see with the colon. Table ref...ref...reference on the left side of the colon and the column name on the right side of the colon.   So this is again directly from the log.   JMP is using this way in the log to address the data table. I think it's better to do what we talked about, it's better to give the JSL name as you open it, and then use that JSL name going forward. So instead of this business before the colon, I changed it to dt.   So here's that same code, just...just fixed a little bit, and when I run it, yeah, it has the right effect. It opens the data table and it   changes the format to two decimal places. Okay so yeah, now that we have that, it's a pretty easy thing to   change all the 19 other columns, right. There...here's NPN1, and then I can just copy that line and change everyone and write it 19 more times.   Yeah, of course that's inefficient, and one of the great things about programming is is you...is you don't have to do the same thing more than once, right. So that becomes easier to maintain if we...if we just write this once, instead of writing it 19 times, 20 times.   So here's how we do that. The first thing that we do is we're going to get a list of all the columns that we want to operate on.   And when I say a list, I don't mean just a generic kind of list. I'm talking about a JSL list, I'm talking about a particular data structure that you use when you're programming in JSL.   A list is just a couple of objects contained in a container and that's how we use it. So here are three ways to create a list of column objects.   The first way is to is to do it manually, is to just type everything. So if I was to run this,   three table reference...three column references, NPN1, PNP1, PNP2.   Note the curly braces both in my code and also when I hover over that CNames variable. When you see those curly variable...those curly braces, you know you're dealing with the list in JSL.   Okay, so that's one way to get a list. Let's talk about a couple of other ways. This is the one we're going to use for this example, ultimately. This is   saying, hey, get all of the continuous columns in my current data table, well in the data table I name here, dt. So message object, create a new list.   So let's run this one.   And now if I hover over CNames, you'll see that's what we want. That's the list of all 20 that we want to operate on.   Okay, and the last way you can do this is, you can let the user select which columns in the data table. That looks like this. I'm just going to select two. Let's select IVP and PNP4, two columns selected.   And if I run this,   we get a list with just those two selected columns from the data table.   All right, so, however, you get that list, the next job is to apply that formatting line to each of the items in the list, and that's called looping or iterating.   So here's the old way to do it. It uses the for command in JMP. That's what you'd do before JMP 16 and I put it here, for your reference in case you're still on JMP 15. But,   you know, it's a little bit complicated and forbidding looking for new users, so there's a much more compact and readable thing you can do if you're using a list in JMP 16. If I want to repeat an action on every element of a list, I can use this for each command.   All right, and let's let's look at it and and see what it's doing. It's taking three arguments, the arguments are separated by commas, we'll take them right to left.   The last argument is that format command, right, applying the format, and instead of doing it for dt NPN1, instead of doing it for one column, I put a placeholder here and I chose list item; you choose what you want.   Right, this is just a word, a variable name, you you choose it. I chose list item. So that's our placeholder, and this is the command that we want to run on each item of the list. Which list? This list, okay. So that's our list, that's the second   argument. And then the first argument is the placeholder, right. It tells us that we're we're using that placeholder to hold the values of the things in the list.   It's in curly braces, so make sure you use that if you if you reuse this code.   So this is what the whole thing looks like, instead of 20 plus lines of code, I just have, well three lines of code, and that includes opening the data table,   getting a list of column names, and then repeating it...repeating that action for all 20 continuous columns. To run the whole thing, o you see what it looks like.   And there it is. Now each of these columns   has two decimal places displayed in the format.   Okay, so that was a very simple example of of using for each to do something very simple on   on a column.   Let me show you a slightly more interesting example.   Here's an example where, for the same data set, for sc20,   I'm going to make a control chart for every column, and I'm going to append those control charts to a journal and save the whole darn thing out as a as a PDF, okay. So   it's using the same for each, and it has the same first two arguments, you know, list item is first, a list name for the second. But now I have lots of code as the third item...as the third argument in for each.   So here we go. Let's run this whole script.   Opening, running a bunch of control charts, slapping them together in a journal. And now, if I look out on my desktop, I've created this thing, my report.pdf, that has all of those control charts, page one, and their page breaks to separate them. So page one, page two, page three.   So you can look at this example too on your own time. It's included in the presentation materials.   Okay, and that is our first technique that we're talking about. Let's clear the log for our next technique.   Okay, extracting a value from a report. This happens a lot in JSL. You do some sort of analysis, maybe it's a distribution, maybe it's a regression, you need to   get something, collect something out of there. Maybe it's a mean or median. Maybe it's a slope coefficient from the regression or a P value.   So that's our job, and the tricky part about this is, reports have a hierarchical, highly nested structure, and it can be a little bit tricky to travel down that hierarchy, travel down that tree to just find exactly what we want. So I'm going to teach you a trick.   The trick is we're going to point and click our way through the analysis, like usual, and then the thing we want to grab,   I'm going to change the text color to red, just a minor formatting change. And then, when we save that script to the log, we'll use that to learn how to address that red item   in JSL. So let's let's go through that together.   Back in JMP again.   I'm going to open Big Class, good old Big Class and let's run a distribution on the height column.   All right, here's my distribution report that I want to extract something from. Let's say it's the median, in this case the median is 63, and I'm going to put that into a JSL variable so that I can reuse it later, and you know write with it or do some math on it or something like that.   Here's the trick to find out how to how...to address that number 63.   Change the color or, you know, the font or whatever of this thing that I want to grab. So how to do that if you've never seen this, this is worth knowing.   There's a button on the toolbar for properties, I believe that's a new button in JMP 16.   So if you turn on the properties for this report.   We can travel down this tree and find what it is that we need. So look at this.   number col box, right. And note that when I click on different parts of the report, the the corresponding   item, the kind of display box it is, is highlighted in the properties panel.   Okay, so we can see that this thing that has all these results is a number col box with all these numbers in it and, like, I told you we're just going to take this thing, and I'm going to change the text color to red.   Just like that.   Now I'm going to close the properties panel.   There's the report, and let's send this script to the enhanced log.   Red triangle, save script to the log.   And now viewing that log.   This is the part we want. It has, you know, the distribution command with the part that changes the text color of that median and those other statistics to red. So let's look at that in a little bit more detail now.   Okay, so here is that code...same code from the enhanced log. I just copied it here and I formatted it a little bit, so we can talk more about what we need.   So the whole reason we did this was to get this part, which we're now going to interpret, all right. It uses send to report and, within that, dispatch.   And we're going to really focus in here on the four arguments to dispatch. We need the first three, the fourth argument is changing the color, that's really not what we're interested in. We're interested in addressing the display box, the element that we want, okay. So   this first argument is navigation. It tells us, go down to display boxes, these are actually outlined boxes, go down to outline boxes   here. So let's look at that report. Again I'm sure you've noticed in JMP that in a report you have these levels of nesting, right, and I have just a...   an outer outline box distribution. Within that is an outline box height and, within that is an outline box quantiles, and inside that quantiles one is the is the stuff I need to extract.   So that's what it's telling me. It's telling me to walk down that tree. I just needed the height and quantiles and that's going to identify it for us.   Good. The next two arguments tell us something about about this...this element that we're trying, that we turned red. It tells us...the third one says what it is, it's a number col box, we saw that in the properties.   And the second argument tells us the title   of that number col box. This one's blank. This one doesn't have a title and that's okay. We can...we can still use this information to get at that data, so sometimes you'll see a title here, sometimes you won't.   Okay, now that we have this, we're going to take that information and we're going to use it   to extract the median.   Here's the code that does it. It starts here on line 23 and it goes through line 29.   We're opening the data table again,   naming the distribution, performing the continuous distribution.   You'll note that we don't have any of that dispatch and and send to report, that's because we don't need to turn it red in our production script.   So yeah, perform the distribution analysis, name the platform.   This is important, you have to...you're accessing the report layer. You're extracting it with this line, so that's creating rdist, this report object.   And then, this is the line where the magic happens. This is where we're grabbing that   that median value, assigning it to the JSL variable, mdn, and for this one, we're going to write it to the log. So I'm going to tell you a little bit more about this, but first let's just run it once altogether, so you see what it does.   I'm going to run line 23, whoops, let's do it this way instead. Let's run line...lines 23 through 29.   Performed the distribution. It's not red and we extracted the median, it is now stored in mdn, you can see that the value for that variable is 63.   median is 63. Perfect.   Alright.   So here's that line where we're extracting the   the median value, that number 63 from the report.   Let's let's look at that. I'm just going to call this up here so that we can   refer to this, as we, as we do it.   So here's that line. It's just again organized, so that we can we can look at each element separately.   63, the median value from the report.   Here's how we get to that.   This is called subscript...subscriptinig. When you see the square brackets in JSL code, that means that your subscripting an object. That object is a container of some sort, either display boxes,   often a list. You can subscript lists this way as well. So what we're doing is, yes we're extracting nested items in this report object.   The first thing we're going to is the height outline box. You remember that, and then from there we're going down to the quantiles box, you remember that.   And now we're in the right place. We just have to grab what we want, what we want is the number col box, right, and because it didn't have   a title that we could use (that was that was up here, no title), instead we're referring to it by number. It's the first number col box in that quantiles outline box.   So you might look at this and say to yourself, well wait a minute, there's there's numbers over here. That's that's a number col box too, isn't it? No, actually it's not. It's a string col box, and remember, you can check all that stuff if you want to. That's, again, from the properties.   And if I click into here,   number col box, string col box, string col box. See them over here? String...string number, so it's the first number col box.   And that's going to grab all of these numbers.   Okay, the very last thing we need to do is is this subscript. Subscripting this way says, get me the sixth element...1, 2, 3, 4, 5, 6. That's the median; it's 63. And when we run this line, that subscripts the report; it grabs that median, so that we can reuse it later.   So there's a how to grab something from a report. I have another example that I'll share in the presentation materials that will have   an element that does have a title and you can see, you can use that as a model to do your own work on, if you run into that situation.   All right, the last thing we're going to discuss is how to get input from your user, the person who's running the script.   And let them choose what file they're going to analyze or even what column in data table they're going to analyze. So this is maybe the most common   thing that that beginning scripters want to do. You know, I've set up my script to do something on one column, and I want to give something to other people in my organization   to let them do it themselves on different data, right. So if you...if you generalize your scripts this way to let your users choose either a file or a column, or both,   you're generalizing your script and you're greatly increasing its value. You're going to give it to other people in your organization. They can replicate your work and do it quickly without going through all the clicks.   Okay, so in this section, it's less of a case study and more of a laundry list. We're going...I'm going to show you three methods that you can let your users choose files and two methods that you can use to let them choose columns.   And we'll start with the methods to choose files. As usual, we'll begin by doing something in the log and then copying that. Here's...here's what it looks like when you open Big Class, you just get this in the log, open Big Class.   dt. I'm using dt in this case, and so, when I open Big Class,   now it has...dt is pointing at Big Class, okay. And then I would go on with my production script, right.   Here's the thing, to let your user choose which file they want, all that we need to do is, we need to find a way to let dt point at something else.   Write the rest of your script, use dt, and then at the beginning, you can change this. dt is no longer just pointing at Big Class; it's going to point at something else.   Alright, and that's something else is determined by the user,   the person running the script. Okay, here's the easiest thing that you can do, I think, is you can...you can use this line instead of opening a data table. If you use this at the beginning of your script to assign dt,   what that means is that JMP is going to run this script and it'll give that dt   name to whatever data table is in front.   So from the user perspective, what they do is open a data table, I'm going to open sc20 here, and make sc20   my current data table. See that up here? That what's listed in in this window is the current data table.   And so, if your user opens a data table and then runs the script and then encounters this line, dt is now going to point at sc20.   So that's a pretty powerful method. The problem with it is...it is it requires your user to know exactly what to do. They have to know that they have to open the data file first and then run the script.   We can do something a little bit more robust. We can show them a file chooser when they run the script, so that they can navigate through their directory structure and find the right file to open.   It turns out that this is dead simple to do. What you do is you just get rid of all the stuff in here, you get rid of the reference to Big Class, and you just say open with no argument at all.   And when JSL tries...when JMP tries to interpret this line, it says, okay, user wants to open something, but I have no what...no idea what. Let's show them a file chooser so they can decide.   So let's...let's run this. dt = open.   Run the script. Here's your generic file chooser and from my desktop, I'm going to open Iris.   Iris is open, and of course, it has the dt name.   all JMP files, all files, right. It could be showing a lot of stuff in here.   So if you need more control, oops, if you need more control over   the choices that your user's going to make, instead of using open, we'll use something called pick file.   Pick file is a command that has a couple of different options. I'm only using three of them.   Let's go through this example. So the first three arguments for pick file are a title you'll see in the chooser window,   a directory where the chooser will start, and a filter that tells you which files you can open from that chooser, right. So let's run this...this line, this this section of code, the pick file command, saving the result in PF.   Run that.   Okay, note that that's that argument over here, select excel file. That's what we see up here in the window title.   The folder, the path in the second argument, that's where it starts.   And the third bit, the the file filter that says, hey, only show the user excel files to open, that's down here. I am not allowed to change away from excel files. I'm restricted to excel files.   So let's let's say sandwiches, and click open.   Well, wait a minute, why didn't it open?   Well, let's let's check. PF equals pick file. What's PF? Ah, PF is the path to the thing that we want to open. We haven't issued the open command yet. Pick file doesn't open a file, it just picks a file.   So to to close the circle, here's what we have to do. Create our table reference by opening the   PF variable, which points at the file we want, the excel file we want.   There it is. That opens the sandwiches data table, and now I have my dt and I'm ready to go on with my script, just like I wanted to.   Let me pause here to show you something important. Pick file, if you encounter something like this and you don't know all the arguments and you want to learn more, a couple of things. First of all, you get a little   hover help, a little pop up that has some information about the syntax. Here's a better option. Right click on it in the in the script window,   and you can look it up in the scripting index. The scripting index is a very important tool for any JSL programmer, and   it has the command you're looking up. It has the list of arguments, some explanatory text, and really importantly, it has a couple of examples that you can run through and see what it does. So, if I just run this example here, it's going to give me a pick file.   Well, it sent me somewhere someplace I don't want to go.   Let's just close it.   Right and then we can we can experiment and grab this code, steal this code and use it in our...in our script if we want to.   Okay that's the that's the scripting index. It is also available from the help menu. Scripting menu...look stuff up there.   current data table, a naked open, or a pick file. There are other ways too, but those are three good ones to start with.   Now let's talk about letting your users choose what column they want to operate on.   Here I am...I have something copied from the log. I opened a data table called prices. and I made a graph on it showing the price of apples over time.   I'm not going to show you the the point and click part, I'm just going to run the script. I think you know how to extract script from the enhanced log already, so let's just see what the script does. Opens prices and there's my graph, apples versus date, price of apples over time.   So look at this data table, prices, yeah there's apples, but I also have all these guys too, right. So what if I want to let my user tell me what graph they want to see, which column they want to see graphed.   That's what this section is about, letting...letting your users choose.   Okay.   So we'll discuss two methods. Here's the first one.   It's it's operating on columns that the user has selected.   And and we've seen this before, we saw this syntax earlier, I think, in the in our second case study. Yeah, this is how we make a list   that contains the the columns that are selected in the data table. So let me go back to that data table.   I'm going to select, maybe three columns and   let's run   this. Actually I think I should I should run dt. Let's run both of these.   chicken, coffee, and eggs.   All right.   So.   Which one do I want to operate on? Well you know how to operate, how to iterate over a list using a loop, but we're not doing that. I'm just going to use a shortcut. I'm going to say look,   if my user selected more than one, just just graph the first one for them. So my user has selected chicken, coffee, and eggs before running the script. I just want to give them the graph on chicken.   Right, so that's the next line here. There's our subscripting. We're subscripting the CNames list to get the first item.   So we'll run this.   Now I have that nice variable, my col,   that has the first item from the list, chicken, the column reference with the colon.   And now we can run our graph builder. Now we can go on with our life, the only difference is we've substituted where it was apples up here,   that is now my col. That's the only change we've made and so running this, we're going to get a graph of the price of chicken.   Good.   Alright, so that's one way, but again, like we talked about earlier, this requires on the user knowing what they have to do. This requires...   method requires that the user has to know that they got to open the data table, select this, and then run the script, and that's a bit of a hassle. So   instead it's a little bit more robust to show a column chooser dialogue to the user at runtime. Here's what that looks like. The syntax is a little bit complex. I'm not going to get into all the details, talking about it at a high level. We're opening the data table just as before.   We're using this command called column dialogue   to surface a chooser...column chooser to the user.   Again, it has a lot of arguments. I'm only using a couple. What I'm doing is I'm saying, hey, give me a button that says choose commodity, and the user can choose one column for that at max and they have to choose at least one.   So here, let's let's run run this script starting at line 60.   Opening price, here's my dialogue and look what it what a rich dialogue I have for very little code. That's the virtue of this column dialogue that we're using. So I am going to have coffee as my selection and click OK.   Good. Back to the script, we just ran this line.   And it's the result of our choosing is stored in dlg. Let's take a look at that.   All right, that's a little complicated. It says dlg is is a list, see the curly brackets, and I got some nested curly brackets, so it's it's like nested lists.   Here's the subscripting that gets you there. I'm not going to spend a lot of time explaining this, but you can copy this code if you...if you need to do this. So to get from dialogue to get just the part that says coffee, a column reference to coffee there,   here's the subscripting to make my col, so run this line.   And now, my col contains just that reference to the coffee column, and and this is identical to before. We run the graph builder using   my col, substituting it for apples and so we get coffee or whatever else the user has chosen.   Okay, that is everything I wanted to share with you. I'm just going to close with a little bit of advice for anybody who's getting started out this way with JSL programming.   The advice is, you know, let JMP right your code, as much of it as possible. And that's what the enhanced log is for, steal that code from the log. That's most of your script,   and then you just have to write the connective tissue, the parts that hold stuff together and makes it into a usable production script. You can steal that code too. You're going to steal it from the scripting index, you can certainly steal from this presentation.   All the materials are with this presentation and you can also steal from the JSL cookbook in the JMP User Community, a fantastic resource that has many common JSL tasks and gives you the code you need to accomplish those things.   And again, I'm going to emphasize, really don't sweat the syntax too much. It's just easier to grab the code and substitute what you need. You're not going to break anything, and you will learn the subtleties of this syntax with more experience.   Thank you very much for your attention.
Peter Hersh, JMP Senior Systems Engineer, SAS Mary Loveless, Manager, Pre-Sales Support, SAS   Whether you want to sleep more, be more productive at work, leave for happy hour early or get an insurance quote, who doesn’t want an additional 15 minutes? Pete and Mary walk through some of their favorite time-saving tips: Interactively build a repeatable data workflow with Action Recorder. Organize and clean up columns all at once. Identify changes between data tables. The semi-live session is an interactive session with a panel fielding questions from attendees.      The materials to reproduce the steps Mary and Pete walkthrough in their presentation are attached abover.   Auto-generated transcript...   Speaker Transcript Peter Hersh Hey Mary. How are you doing? Mary Loveless, JMP I'm doing okay. How about you? Peter Hersh Yeah, making it through this hot and smoky summer. I saw you were out fishing. Mary Loveless, JMP I was, I was in Montana, that was hot and smoky and then I came back to Boston and it was hot, and I brought the smoke back with me. Peter Hersh Well, I saw a picture of that fish you caught, so I think totally worth it, right? Mary Loveless, JMP Yeah, most definitely. Hey, I got something to share with you. I created this animated gif of my well plate information. Okay yeah. And I wanted to share with you, because now I'm trying to think about that...you know I'm always thinking of a shortcut. What can I do? How can I simplify things? And there's got to be a way for me to capture this and be able to see it and capture the code and everything with it. And I don't have to write everything down and save everything to a script. And you know, write notes in a journal of different steps and all this to share with my peer. So there's gotta be...you gotta...you gotta help me out, Pete. Peter Hersh Yeah. Yeah. Mary Loveless, JMP This is not my favorite activity. But I didn't know that, did you know that with the local data filter, you can animate? Peter Hersh Yeah, isn't that cool? That...that looks really awesome there, Mary. What...what are we looking at? Mary Loveless, JMP So we're looking at data over time, and I'm looking at a specific instrument, and I'm looking at the wells. This is a 96 well plate and I'm looking at the potency in each well over time, so I'm looking at exponential growth or potency in the well. But I think animation is a great way to kind of communicate and show people what's changing over time. Yeah, and then eventually you could overlay some of these and see what's changing. Peter Hersh So tell me a little bit about your challenge? Where...what kind of data are you starting with? I see your output here, but what...what process do you have to go through? Mary Loveless, JMP So I have a csv file so, you know, the data comes from the instrument, and I get a csv file. I bring that in, I open it up in JMP, and then I go through some data table manipulation, some column manipulation. And then I create the the graph in Graph Builder and create this animated gif. Okay. It sounds easy when I say it. But when you sit down and try to write it out and document it, it becomes very tedious. So I had heard something about an enhanced log or an action reporter. So help me out because you know I don't keep up with things. Peter Hersh Absolutely, Mary, why don't I I'm going to steal the screen here and just kind of first start out with a recap of what we're... what we want to do. So you start out with this csv file, you bring it into JMP, you got to do some data manipulation, and in the end, you want to create a shareable object here, and you're not a big fan of writing a script. So we're going to do this all without having to write a bit of code, because as much as we love bugging Brady and Mike and those coders to help us, it'd be great if we would didn't have to do that. So let's...let's...let's look at how we can do that in JMP. So the first thing we need to do is turn on that that log, right. This is something yeah, the log, exactly. So this is something that you might not have used in JMP in previous versions, because it was about scripting and seeing if your script worked but this is what it looks like. And we can turn it on by going to view and log. You'll also notice that when you... Mary Loveless, JMP Who knew there was a log? Peter Hersh What's that? Mary Loveless, JMP Is the log brand new? Peter Hersh No, the log's been there, it's just got a lot better in JMP 16. Mary Loveless, JMP So what did it do before, just record when JMP would crash or what? Peter Hersh Kind of. It would...it was when you ran a script and the script broke, it would tell you where it broke, and what happened, what kind of error you got but... Mary Loveless, JMP Oh okay, so it was more of a log that a scripter would leverage or used to see and track and debug. Peter Hersh Yes, exactly and now it's become more of a general purpose, like if you're used to, say, recording macros in Excel or something like that, yeah, this kind of does the same thing, in my opinion, much easier because you don't have to like turn on the record button, it's just doing it for you automatically. And, and so that's... again, to turn this on you go to view and log. It also by default, you'll notice, it's in the home window just hidden, so I can just double click on that too and open it up. Gotcha. Okay so let's start this process here. You gave me this...some of your data here and like you said, the raw data comes in the csv file. Right, let's start by just importing that csv file. So I'm gonna just take that, drag it to a JMP window. There you go. There's our first step in the process. We have our data. Mary Loveless, JMP Now did I need to have the log open? Did you just drag it...drag it onto the log window? Peter Hersh I could have dragged it onto any JMP window and it's just going to record what was done. Gotcha. And if, for folks that have been using JMP for a while, all that's captured here is the same thing that's in that source script. So that part was already captured, but this is just kind of keeping a running log of everything we've done. Okay, so what's the next step now that we've imported? What do we need to do? Mary Loveless, JMP So it's a wide table. It's got time across, so I'd like to make it more tall and just stack time, all the time columns. Peter Hersh stacked data. Okay? Mary Loveless, JMP Yeah. Peter Hersh There we go. Does that look a little better? Mary Loveless, JMP Much better. Peter Hersh So what we do next? Mary Loveless, JMP Select the label column. Peter Hersh Okay. Mary Loveless, JMP And go to new columns menu, utilities, text to column. And you notice that you have to put a delimiter, so let's a space. hit the space bar because there's just a space. Peter Hersh Okay. Looks good. Mary Loveless, JMP And take label and label 1. Select those two and delete them. Peter Hersh Okay, so just right click here and delete. Mary Loveless, JMP And then I want label 2, I want to change the name and the column properties. Okay. Change that label to the time. Okay, and change your data type to numeric. Peter Hersh Okay. Mary Loveless, JMP And change the modeling type to ordinal, because the time is an order. Peter Hersh Got it. Awesome. Just hit apply and okay. There we go. Mary Loveless, JMP And now, this data is ready to go into Graph Builder and click add animation. Peter Hersh Okay. Perfect so. Mary Loveless, JMP Open up Graph Builder and put location and the map shape. Peter Hersh Got it. Mary Loveless, JMP And put data on the color. Peter Hersh Alrighty. Mary Loveless, JMP So now that's just the, you know, data, as the summarized data for all of it on top there. So now, what I want to do is look at each time point. So let's go under the red triangle and select local data filter. Peter Hersh Alrighty. Mary Loveless, JMP And plus select time. Okay, and so I like to just do a few tweaks here. So under the main local data filter red triangle, I like to turn off the counts, since we know it's 96 wells so. Down at the bottom. And I also like to, under the red triangle next to time, under display options, which sometimes I forget about, I like to have the check...the display, because I like to have the boxes checked as I go through and animate. Peter Hersh Okay. Mary Loveless, JMP Now, I think I think I'm ready. Peter Hersh Okay, I don't want to offend John Sall, so I'm gonna hit done. Mary Loveless, JMP Oh yeah, have to say done. Peter Hersh All right, okay, and then we'll just apply one of these filters. And so now it's not summarized, it's a single time point, okay. Mary Loveless, JMP And now go on to the main triangle and say animate animation. Peter Hersh So cool, I don't think people use this enough. Mary Loveless, JMP Oh no, no. Peter Hersh I love to use animation. It makes it look like I'm doing work and very busy, and you know, if I create one of these, and just let it play, you know. Mary Loveless, JMP It's my screensaver so... Peter Hersh That's right. Impress your boss, right? Mary Loveless, JMP Right. Alright, so let's take a cycle through an animate. Peter Hersh Okay. And I just turned on that red triangle there to record, so it's playing and recording at the same time. And then, once it's sort of cycled through, I'll turn that off and now I can pause this and then I can...I can save this as, you know, our graphic or whatever we want to call it. Mary Loveless, JMP Right right right right. Peter Hersh And hit save. And there she is. All right. Mary Loveless, JMP Perfect, perfect. So now show me this enhanced log and show me what it...what...how it's going to help me. Peter Hersh Okay, so you can see here in this enhanced log, right, that everything that we did to the data table was captured there. It imported, stacked, text to columns, deleted those extra columns, and then changed some column info. Mary Loveless, JMP Where's the Graph Builder? Peter Hersh Great question. So why doesn't the Graph Builder get put in there right away? Because I could still be making changes to it. It doesn't get added until I close this. Now watch. Mary Loveless, JMP Oh. Peter Hersh There we go. So now everything I need to create this is right here. So just to kind of show you how this works, I'm going to close these two so we know I'm not cheating. And then I'm going to clear out those last two closed things. So now we have the import to the Graph Builder. Mary Loveless, JMP Right, right. Peter Hersh All right, and so, now that the easiest way to do this is going under the red triangle. Yeah, save script through a script window. Mary Loveless, JMP Uh huh oh. Peter Hersh Now you can see all those steps from... Mary Loveless, JMP Wait a minute. So I see the stack, I see the text column. Oh, and it and wow, it actually puts a comment string above the action. Peter Hersh Yes, yes that's really handy. Mary Loveless, JMP Nice, nice. Peter Hersh It's like having a good coder in your back pocket so not just me and forgetting to comment things. This actually tells you what it's doing, and this is a great gateway to being a scripter yourself. You can easily tweak this, and you can see how this step was done, right. If you forget what stacking...how stacking or text to columns is done, it's all right there. So now that we've saved this let's just run it and make sure it does what we want. There we go, and now you can see, if we want to recreate that animated gif or just have this JMP report, all we have to do is hit play there. Mary Loveless, JMP Awesome. That is awesome and that's just new in JMP 16? Peter Hersh Just...yeah in one release they created all this. Isn't that awesome? Mary Loveless, JMP Wow, I think that's really neat. So now that I have this automated process, for, you know, for myself and through this enhanced log, how do I share that with people and how do I document it for record, if I want to do any record keeping or any management like that? Peter Hersh I am so glad you asked that question, Mary. Those are great questions. So two things. I can just save this script and send it to anyone, and they can run it as well. The one sort of thing that you might have to do is make sure that they're navigating to a csv file that's maybe on a shared drive or someplace they can access. Mary Loveless, JMP Teah, got it. Peter Hersh But the other thing I can do is... you'll notice under this red triangle, I can actually...well, let me, let me clear out these extra steps because we're...we don't want to capture those. So I'm going to clear those and then I'm going to go here and under the save script option, the final thing is make a data table. Mary Loveless, JMP Ah, oh. Peter Hersh Yeah. So you can see here each step that was done, and then the message about that step, and then it tells you the results, and it tells you that the JSL and it's all captured here. Mary Loveless, JMP Oh, that's just really nice, because part of the documentation or your record keeping or just for quality, you have a record of the script that the action that that was recorded against that particular data. Now one additional question, Pete, is you know, I noticed that you had to go in and remove some things that we didn't want in there, because we did some steps. Is there a way to turn it off and not have a recording? Just capture what we have, or is it always on? Peter Hersh Yes, so I can..I can tweak these options here individually. I can also set my preferences under file and preferences. And all the way at the bottom, there's this log, which allows me to turn off or turn on things by default. So I can, like you said, I can have it stop recording or only record certain parts, like data table operations and stuff. Mary Loveless, JMP Oh, this was...this...you have made my day. Peter Hersh That was easy. Mary Loveless, JMP That was...that was easy, and here I was, figuring out this interesting steps up put together and and now I can actually record it and then I have ways between the data table or the script itself to deploy it to my team and be able to leverage it in about less than five minutes. Peter Hersh Yeah, absolutely. So did you have any other questions about this or anything else? Mary Loveless, JMP Yeah, my other question that comes to mind in thinking about the data table and bringing it in and stuff, you know, we're always...we have lots of different data tables that we're bringing in and and we're always operating on them in this shared space. Is there a way that we can maybe do a quality check or check to make sure the data table that with the final outcome or the output that I want, I can compare it and make sure it's the most recent or I'm not using one that maybe is older or less than? Something that shows changes and data tables. Peter Hersh Yeah, absolutely. So let's let's close this and I'm going to clear this. So let's run this script again here and we'll just generate that. And we'll...I'm going to close out of this graph and close out of this and what we want to see is, is this data table the same as the data table that ...that you ...that you sent me? Did we make any other changes that that we maybe forgot that we made? So I'm going to grab that and look here. And you had sent me this data table here. And here it is over here. And we want to see, is this the same as this or was there another step, maybe, that we added and changed to these two that maybe we forgot to document? So let's let's take a look at this data and compare it to that. And we can do that under the tables...compare data table. Now this has been here for a couple releases. Mary Loveless, JMP I know, but it's never been....it just says, yes, it's different. Peter Hersh Yeah, right, which wasn't super valuable. Right so let's take a look at what it does in 16. Alright, so we're going to hit compare. And you can see here, it's comparing the stacked data, the one that we made with that script to this plate test, which you sent me that had been tweaked. So by default, JMP is going to see the columns with the same names and compare those. Mary Loveless, JMP Right. Peter Hersh We can see here that this last one, data, which is actually potency, we didn't change the name, so it didn't link automatically. But we can tell JMP, hey, these are the same. We want to link them. All right, and then what what we have...a bunch of options here, but the first thing I'm going to do is just say, let's just compare these row by row. We know that they're all in the same order, so I'm going to hit compare. And you can see here that it's found 900 differences. So it says each cell is a little different, but if we look at this, it's just a difference in, like, significant figures, right. So well this one looks like it has, you know, 10, this one has 11, or something like that. So Mary Loveless, JMP Right. Peter Hersh it's not...yeah not really different, but JMP has flagged those as different because they're slightly different. And what we can do up here is allow for some relative error, which, we'll just say, okay, well, if they're that close, we don't want to call them different, and now we can see that that data is no different with that tiny amount of relative error. Mary Loveless, JMP Now we got that difference, but what happens...I mean, we got some different scripts in the table, we might have done put added spec limits, you know, there might have been some different properties. So under...if you go to column info, Peter Hersh Yeah yeah. Oh. Mary Loveless, JMP and know the properties, is that everything that's under column properties there, do you catch everything? Peter Hersh That's a great question. So if you notice here under the red triangle, right, by by default, the only thing on is the data, but I can compare table properties and column properties. Mary Loveless, JMP I gotcha. Peter Hersh a spec limit and a map role. And so those were not in the data table that we created with that the action recorder, but they're here. And so we can see, okay, well the data is similar or or the same even. We have some properties that weren't captured so that that's... And we can also like you said, compare table properties, so we can see is there an additional source and or something like this and you can see... Mary Loveless, JMP I see. Peter Hersh That's captured. Mary Loveless, JMP Is there a way to save this to a data table, to a script? How can I save this information, if I have to use it for quality or validation? Peter Hersh to clipboard, to a data table, to script window, or just like this journal here, right. And the nice thing here is that journal...I'm going to just close all of these...and what that journal script, or or if I saved it to any place, does is it just opens up that...or compares those two data tables if they were open. I happened to close the wrong one so. Mary Loveless, JMP Okay, all right. Well, that's pretty neat. So this too...well, right now, there's the enhanced log action recorder, which I love it, that it's recording actions, and and the other is compared data tables. Peter Hersh Yes. Mary Loveless, JMP And that's pretty neat. I probably have to do a little extra reading on it just to understand how to navigate the report window,but just to be able to go through and see what properties are different, especially when somebody shares a table with me and they've added spec limits and they ordered columns and done some interesting... added some interesting properties to the data table, it allows me to see the differences. As long as the data integrity is the same, you could have those attributes or those properties be added. Peter Hersh Absolutely. Mary Loveless, JMP Well, this has been really insightful and that you have given me back at least 15 minutes in my day. Peter Hersh Perfect. Mary Loveless, JMP I can actually. add a little extra time to lunch. I was going to say quit early, but I don't want my boss to hear that. Peter Hersh Yeah. Hopefully Ian's not watching. Mary Loveless, JMP Thanks, Pete, it's always good to chit chat with you and see what's new and what's going on and and hopefully, at the end, people will get as excited about the enhanced log action recorder compare tables, as I have. Peter Hersh Yeah, absolutely. Well, thank thanks for the time, Mary, and hopefully, this was useful. Mary Loveless, JMP Oh, it was. Always. Thanks, Pete. Peter Hersh Bye bye.  
Monday, October 4, 2021
Hadley Myers, JMP Sr Systems Engineer, SAS Peter Hersh, JMP Senior Systems Engineer, SAS   Accessing data is often a very time-consuming and aggravating step of the data workflow. It can delay solutions being implemented or allow problems to persist undetected, often with disastrous consequences. Why can’t someone create an “easy button” for data access, allowing you to pick your data source and filter down to what you want? Could it be used to combine data from multiple sources, if needed, and even automate reporting so that issues are flagged immediately rather than at the end of a lengthy and tedious “data munging” exercise? In this talk, we show how to create an interface from JMP’s application builder utilizing SQL, and then build a WebAPI example from scratch. The methods shown could also be equally applied to data stored in other formats. The final application will make data access as easy and fast as pushing a button.     Auto-generated transcript...   Speaker Transcript Peter Hersh hey Larry.   Oh I'm not connected, can you hear me now.   Oh.   let's see I can't hear you.   Something where is it running through.   yeah yeah there you are. Larry LaRusso Okay, I can hear you.   that's really weird.   To me.   It was working fine so who knows, maybe.   Maybe. Peter Hersh If you do three ones bound to fail.   The law of averages. Larry LaRusso and actually maybe I'm living a better life. Peter Hersh Total totally usually it's like to will fail. Larry LaRusso In fact, the last zoom meeting I have not even sure if it.   If it recorded properly because I always get a notification that.   It got recorded and I didn't get one, and now I went into the meaning it doesn't look like it did. Peter Hersh So no. Larry LaRusso Well it's worse than that it's an external guy.   he's retired, you know. Peter Hersh I name sounds familiar I don't.   I don't know him personally. Larry LaRusso Rochester.   Kodak and then Boston mom and he's super nice he's retired he's kind of working on his own time or whatever, but so he probably won't be the end of the world, but.   still looking forward to going back to him and saying Oh, remember that. Peter Hersh Remember what you did there. Larry LaRusso I'm going to check with Jeff and just make sure that I'm not missing something, but I don't see this one has a little recording thing, and I could have sworn in as well, so.   yeah. Peter Hersh yeah I'm just closing out of all those things like teams.   They had. Hadley Myers hey. Peter Hersh I like I like the guitar that's nice touch. Hadley Myers Well it's it, I wonder if I was just thinking that I don't have a background here, I wonder if I should add one. Peter Hersh No, no, no, keep it there and let's do the start and you'll just be playing the guitar and I'll be like had had someone's here.   Sorry yeah.   yeah. Peter Hersh yeah it's way better than.   The JMP by default here. Larry LaRusso All right, you have you guys have done.   I mean I don't think I've seen you before but. Hadley Myers I think we've been introduced. Larry LaRusso yeah. Hadley Myers Great nice to see you again. Larry LaRusso yay good to.   See you in person, so I'm I'm going to pull up the list, but you guys have done this before, so you know you know the deal here but.   I think we're we are automatically recording release it says that upper left hand corner so we're going to splice it here and there.   I'm going to go ahead and mute myself and get rid of my video just to make sure that there's no way I pop up here anywhere, and if you guys could just kind of give me.   A very obvious, thank you for joining us today are welcome, or whatever you want, so they started and then, if you close it down deep you know you're doing the live version. Peter Hersh No we're not. Larry LaRusso Okay okay so just close it out with thanks for joining us today, so no window to chop it, but the backgrounds look great.   So I don't know that I have a whole lot of other things, the same and just whenever you're ready, and if you guys feel like there's just something that you complete they can't live with and want to start again that's that's totally. Hadley Myers Fine, the pizza john wait wait wait wait pete's con. Larry LaRusso O p. Hadley Myers We lost him.   let's wait a second for him to come back.   In the meantime, I want to make sure that I'm using the right audio.   And it doesn't look like I am I want to be this sounds good thing I checked how's that is that better. Larry LaRusso Yes, a little lower. Hadley Myers doesn't really. Larry LaRusso yeah the. Hadley Myers Better. Larry LaRusso That sounds good. Hadley Myers Okay last API we got worried. Peter Hersh And that was a fun trick. Larry LaRusso Well now, now I'm not one of three we talked a little bit.   But that was. Peter Hersh yeah that was on me, I was trying to hide my taskbar and, for some reason zooms thought that match shut down.   So. Hadley Myers I think my taskbar it's OK, with everyone because I just have JMPed in one I don't know why I don't see why should.   yeah I think there's a fire anybody if I keep it. Peter Hersh zoom might zoom might shut down. Hadley Myers All right.   Shall we just practice switching screens, to make sure that we can. Peter Hersh Okay I'll share mine first.   Make sure I can do that. Hadley Myers And we keep our cameras and mike's own when we're not presenting right and. Peter Hersh You guys aren't seeing this up here right you're not seeing.   Or are you. Hadley Myers What are you putting up. Peter Hersh So I'm sharing and then I see the zoom window up here, but you guys don't see that right.   You just see a PowerPoint is that that. Hadley Myers Will I see a PowerPoint in not in presentation mode. Peter Hersh yeah yeah.   I would present I will go like that okay. Hadley Myers No that's all I see is a PowerPoint. Peter Hersh Okay perfect okay so, then I would go blah blah blah.   And then.   Then I'll say okay JMP DEMO hand it to Hadley So do you want me to stop sharing or can you just steal it from me which. Hadley Myers let's try, I think I need to.   let's see.   yeah if I just do that does that.   yeah you can see my screen. Peter Hersh yep. Hadley Myers stole it from me, I can show you somebody can go up here pull up my dashboard whatever. Peter Hersh yep. Hadley Myers And now, can you grab it back. Peter Hersh yeah Let me try.   Okay, then blah blah blah.   All right. Hadley Myers I think allison. Peter Hersh Okay perfect. Larry LaRusso Good. Hadley Myers So you start right. Peter Hersh Yes. Hadley Myers All right. Larry LaRusso I'm going away guy.   Okay yeah and.   You know that they're recording the speaker view as well, so we're getting we're doing individual speaker slip slide and speaker just slide so just know that you have the potential to be on screen when you're not talking but you guys are.   All right, okay I'm gone guys take it away whenever you're ready. Peter Hersh Alright, well, thanks for joining us today. Hadley and I are going to be talking about data access and a way to streamline that and make it a little easier using the JMP application builder.   So here is an overview of the analytic workflow inside of JMP, and the first step in any analytic journey is to access your data.   And oftentimes this can be a time-consuming and tedious process and it's many times overlooked as a   critical process that needs to be done before you can do any of the learning from your data. You need to be able to access it. So Hadley and I are going to show a couple tricks on how we can access data from different sources and hopefully ease the pain of that.   So what we're going to show today is creating an easy button, which is   basically using the app builder inside of JMP to access data externally, either through an OBDC connection or a webAPI.   You can also apply this to local files or any type of data source that you might have that JMP can access. And then we'll also show how we can combine sources   and apply filters to multiple different sources in the data pull. And then finally we're going to wrap up by showing an interface on how to enter data manually,   but make that a little easier process. And then in the end, how we can distribute anything that we make throughout the organization and make this a   distributable add-in that we can share. So we're going to start out with Hadley showing us how to connect through the ODBC driver. Hadley Myers Yep.   Thank you very much, Pete.   Hello to everyone watching this wherever you are. So I'm going to start off by showing you this app here that was made using the application builder and what it does, is it connects to a database.   You've got, perhaps in your own system, different locations or different servers where your data is stored, so you can select the appropriate one. And what this does is it allows you to search and filter for tags. Now what you're seeing here is one   SQL query. And we've reached into the database, we've pulled out some tags, I can select the ones that are appropriate.   I can choose the time period that I'm interested in looking at, in this case, the database goes back quite a ways. Run that data and select...   you know, immediately pull in the data that I need. How we've seen people work with these is they take these things and they put them in the corner   of their desktop, like this, and anytime they need data say, Oh, I should have gotten the 102s as well. That's no problem. I can just go, pull those like that.   So this is the type of thing that we're going to show you how to build today, and the example that we're going to use is a little bit simpler than this.   You can see that here. So simply select the time period, choose whatever   one of the tags that you'd like to search for, in this case, it's an analysis method. It's a filter on one of the categorical columns, and then pull the data in. So the more complicated one you saw is just an extension of adding functionality to this.   So we're going to start out by using the query builder in JMP.   Once we've connected to our database, which I've done here already, I'm going to go in and select the table that I'd like to start with. And in this case, it's my production data and did this. I'm going to join my result   and my sample data here. So let's go ahead and build this query. I'm going to start in by pulling in my lot. I'm going to pull in all of my production data.   I'm going to put in the sample date created, and the analysis, and perhaps I'll take these and put them near the top, so that you can see them a   little bit better. So those will be the first columns,   as well as my result. So if I were to set up this query using the filters, what we could do is, for example, filter on our analysis by putting that there.   And filter on our sample date here, and maybe we'll change that to a range, so we say predict...grab data between one date and   another date. Now normally, what you would do in this case is you would   prompt all on run, so that when you ran the query,   you could have your user select   the appropriate dates, and again I think this database goes back a bit of a way, so we'll just choose a very broad range there. Press OK. And now I've got my data for that analysis and that...   those dates, but what I'm going to do   is I'm going to unprompt these, so clear prompts. I'm going to manually select a few, just so I can see how these fit into the query...   into the code.   Yeah.   So I can go ahead and copy this run script, right, which executes this query, including the filters. And now what I'm going to do is I'm going to go and open application builder.   I will choose a blank application and will create a button.   So when I right click on the button, I can choose what script I want to run when the button is pressed. And in this case we're going to simply run the entire run script from our query.   So now, if I were to run this, I press the button, I get my data in exactly the way I set up the query. But what I would like to do, and perhaps what you'd like to do, is be able to allow a user to choose the query filter   variables, so which time period to choose and which analysis, and for that it's   amazingly straightforward.   What I'm going to do is I'm going to grab two number edit boxes for the start and end times of my query.   And what I always like to do is to take these and put these within a panel box. It just makes it a bit easier for the user to see exactly what it is that they're...you know, it allows me to put an instruction. So I can write here, something like, enter start date   and enter end date.   Like that.   And we'll give these variables names that we can understand.   So we'll call this one...if I click on the number edit box, I can name this guy   start   number edit box.   And we'll call this one end   number edit box, and we can also address the formats   so that we use our   date and time calendar format here.   Just like this.   And, of course, what we need to do is get the information from these   data entry   variables into our query. And so when the button is pressed, what we'll do is we'll create a variable called start   and we'll make that equal to whatever value is   typed into our number edit box.   Get that value.   We'll get the value from our end number edit box.   And we'll take these   and insert them into our query here.   So if I were to run this,   I can choose my start date   and choose my end date.   I can run that and get my data. Now I'll show you one more very quickly, how can we filter on this analysis. For that I'm going to use this...   I've got different options for different buttons I can use. I think what I'm going to do is I'm going to include this checkbox here, and I can put in the values that I'd like them to select on.   Now, maybe we'll change this to analysis.   Yeah, check box. So here I'll simply create another variable analysis.   And if I can't remember what command I need to use to get the info...to get the selection out of this, all I need to do is click on this.   And if I right click, a nice trick is I can go right to the help menu, right to the scripting index, and I can see now that what I need is the get selected command to be able to get those.   I'll type here   Get selected.   And now we can take this analysis   and replace our   query parameters right in there. So if I would run this.   Once again,   I can select my start time   and select my end time.   And I can choose one of the parameters to run or the other, or both.   So the next step...   and of course, I can add to this as much as I want, but what I'd like to do at this point is distribute it within my organization, and the trick here is that anyone who has access to this   database location will be able to access or make use of this add-in if I were to share with them. So what I'm going to do is click here and save the script to an add-in.   We'll call it database   access   and   dbaccess1,   just to give it a unique name.   Access data. And the entire script needed to be able to make the add-in work is located within here, so it's already...it's already all done. I'll save that to my   desktop.   And as you can see now, it's installed right here.   So all I need to do is take this add-in, which I think is saved on my other screen.   Here it is, and I can email that to anybody within my organization. I've written this email ahead of time   already for your convenience. Email that and if they have access to this location, they'll be able to install the add-in and use it right away. And at this point I'd like to stop sharing and send...   move back over to Pete, who will show us how to access data from the web. Peter Hersh All right, thanks, Hadley, And yeah, great great demonstration. So in this part we're going to build off what Hadley had shown there.   And we're going to end up with an add-in that looks like this. So going through the same process that Hadley did. I'm going to skip the ODBC part and show   an API part, but the idea here is, I have two different data sources. I'm going to pull data from both of those sources, join it together, and then create a final data table like this. So we're gonna start with what what Hadley had shown already completed, which is   a simple ODBC pull with even less detail than Hadley had, where we just have   data on two different stock types and we have one filter that it's looking at to select that stock type, and what we're going to do is build in a   API connection. So I'm going to go through a similar process that Hadley did. He described it so well, I'm going to just go a little quick here and add a panel box with a text edit box in here, and like Hadley showed, we can   use and reference these   boxes, because they have a name. And I could go through and rename these like Hadley did.   In this case, this is just called TextEdit1, and what I'm going to tell people to do is enter their API key entity here and that API key we'll use to   have each individual person be able to access data using their own API key, so not having to distribute in the API key along with this. And then, just like Hadley showed, I'm going to create a button here and we'll call this button   API data pull.   All right, and and again the same thing, we're going to right click on here and add a script on press. So where we're accessing this data from is   this website called Alpha Vantage, and this is just a good way to demo and try out an API data pull. It has very nice API documentation. If you were going to do this inside of your own company, you would have slightly different   commands that you would be using, but the general idea will be the same, is that you have to pass a few things from a...   from JMP to that API to pull in. And here, I've prewritten this script, so basically, I have to pass in a key, I have to tell it the URL, what type of request I'm going to do, and then I have some   more script down here that works on getting that data in the right form for JMP, so parsing that JSON. So I'm going to select this, which I've already written here. Going to copy that and then, just like we've seen before, if I put that in here, now I have my   data pull in there. So a couple things I need to do here is this key is...I actually want to reference that API key here.   So what I'm going to do is say, okay, that key equals   textedit1 and I'm going to just say, just like Hadley showed you could   go through that help index and figure out what is needed to grab that text, but in this case I'm just going to say, the key is...you go to that textedit1 and you get the   text. The other thing here is, I want to reference this box here, which happens to be called Combo1, to filter on. So I'm going to add that   down here as well, and we'll just call that stock   equals that variable name, Combo1, and then this again, just like you Hadley showed, is get selected.   Alright, so now what I'm going to do is my query is going to grab that key, grab that stock, and then go out to the website, ping that, and let's go ahead and run this to show. So I'm going to just use a demo key here. We'll maybe change the stock.   It will...oh, you know what? That's a good   thing here is, I forgot to change that in the code here, so I'm saying the stock is combo1, but over here   this is telling me that my symbol is Microsoft, so that is...if I don't put in that stock there, it won't do the pull properly. So let's show that one more time here. Show you that it is now looking at the filter and not hard coded in, so now it's pulling the right data.   Okay, so now, the next step is I want to actually grab data from both these sources and bring them together. So there's a nice feature inside of JMP 16,   which hopefully you've all heard about here, which is this new, improved log, and I'm going to go ahead and clear that out.   And this log, what it does is it captures any type of data manipulation you might do. So let me show you an example here of a join. So I'm going to just join together these two   data tables and you can pick how you'd want to join, maybe, what type of   columns you want in that join, maybe name   your   data how you want to. And once you're happy with this, you can hit Okay. And so you'll notice here in that log that it has captured all I need for that join, so I'm going to go under here, go to save script to the clipboard and then I will just   go back and create a new button in my app   here. And again, I'm going to use a button box, but you could pick whatever type you'd like. Again give...maybe give this a name; we'll call this join.   Right click on here.   And do that script on press.   And then just a copy and paste.   And there now, it will go through and do the join. So really, you know, we could pull each individual ones and join, but I think the end goal is to do this all in one step so.   All you have to do is grab all of these scripts into one place, which I have happened to do ahead of time,   and the...and then create a new button. I'm going to show two things I added to this script once I put that in there, so we'll call this the easy button to stick with the theme here.   Easy button.   And I'm going to just   grab this script here, so I'll select all of this, copy that and then   do the same procedure here.   And paste that.   Alright, so a couple things I added to this script beyond just copying and pasting what was already there, was this wait command and what that's doing is giving me a little bit of time to   to let the data pull happen before it starts joining, and then I'm closing out those two additional data tables, so I don't...I'm not just building in multiple tables. So let's   look at this. I'm going to put in my API key, I'm going to select my stock. I'll hit the easy button, pulls from two places, waits for two seconds, and then joins them together.   So that is the end goal here, and now I'm going to pass it back to Hadley to show a good example on how to do the data entry a little smoother. Hadley Myers Yeah.   Thank you very much, Pete. So we've showed you now how to get data from an external database using SQL   database and Pete showed you how to get data from an API...web API and then combine it with some data that you've got. It could be that you want your data in some other source, maybe you need to access it using Python, that's all fine,   or it's in different files, but it also could be that you need to input the data manually. And this is an example of a data set here,   where we've got cereal names and manufacturers, and all of this data has been inputted manually from the labels of these boxes, which unfortunately aren't in a format where the data can be easily pulled in automatically. So somebody's going to need to do this   manually. I used to have a job like that once where I did it, and so I can definitely sympathize with the person who needs to do that.   Now, you could have it inputted on the...from the table like this, but what I've done is I've made it a little simpler by creating this application here.   So you can see now that they can enter it from simply, you know, one screen and then that will add it to the table. Now I have intentionally left out the last two   columns. The method for doing this would be exactly the same. What we're going to do is...so you see here that the first thing that the button does   is it adds a row to the table. It figures out how many rows the table has and then, for each column and row number,   it will simply grab the data from each of our inputs and then save them to each one of those cells, right. So let's go ahead and do it for the last two.   And you might have noticed that this is quite even and, you know, nicely spaced out, so that was done using these nice splitters here, the horizontal and vertical splitter. So I'm going to go ahead and add a horizontal splitter just at the bottom here.   And to this, we will add a panel box   and a number edit box at the bottom.   And I will add yet another panel box   and another number edit box. So we'll call this,   our last two columns here, which is weight per serving and our   cups per service from our cereal labels.   Right now, I can rename these guys, but I won't do that for now. All I need to do here is go ahead.   And in our data table column, weight per serving.   And our row number.   I will make that equal to our number edit box that we just created.   The same thing here   for our   cups per serving Hadely Myers under edit2. Hadley Myers Right, so now, if I were to run this, I can put in all of the data and then get it   into these   cells or to add the row at the bottom, but I'm going to show you one last thing, which, if you're creating this for someone, I promise you they will greatly appreciate it, that's to add   a little bit of color to the picture just over there to the...to the application. So let's go ahead and you can add a company logo, or you can just add a stock photo like I've done here.   So it doesn't have to be perfect. If we just add it in like that, you can see that when we run this, it makes somebody's day just a little bit nicer than it was before. Now, rather than saving this as an add-in or saving it   to a script, what I'm going to do is go ahead and save this right away to the data table.   Call it add   data or add row, whatever and now you can see here we've got our add-in right there. So I can run this add-in   cereal. The nice thing is that if I press tab, it goes to the next one.   I'll just add some pretend data   and so on. Right, add that row and you see that.   So, hope you found that interesting, hope there's an opportunity that some of you have to be able to use these or create these. And with that, I will pass things back to Pete. Peter Hersh Great, thanks, Hadley.   I am just going to wrap up here with a couple more   slides. So summarizing what we saw here, there's oftentimes difficulty with accessing data, and that means problems can persist longer than they needed to. So we hope that you found some benefit out of this and can streamline that data access or even data entry.   And   getting... getting your data in the right format and   easily accessible is about 80% of your solution, so usually that's where you spend 80% of your time. So if you can streamline that, it can be a huge time savings,   and have your smart analysts and scientists and engineers working on the data problems, not not data access problems.   And we showed that creating easy buttons is pretty straightforward inside of JMP's application builder, where you're able to choose a source, filter, combine multiple sources, and even add data manually.   And then, like Hadley showed, we can distribute that throughout the organization with an add-in or we can also do that with just a script or an app itself.   So thank you for your time. Hope you guys found this useful and we will answer any questions that you guys put into the chat down below. Thank you. Larry LaRusso awesome thanks guys.   Well, you haven't been happy with it. Peter Hersh We kept it under a half hour, we were.   When we time this out, the first time we we had to take a little time off so.   We figured since it's not semi live it's.   Probably not a big deal right like we.   We can have a be 28 minutes instead of 24 or whatever it was supposed to be so. Larry LaRusso It was great so great so I'll go ahead and get this.   out to the Jeff and his team, to make sure they've got everything if there's some issues will let you know, but I suspect you're in here. Peter Hersh Alright well we'll see if the recording work this time. Larry LaRusso Oh, my God, I know it did this time.   Actually, I like I said it's it's blinking up there, recording so I'm assuming it's good badly, I was telling you before you joined us I data recording with an external guy yesterday and I'm almost certain that didn't record.   I've got to go back to them it's going to be a terrible discussion and I'm super nice he's retired, so I think he has a little more time, but if it's gonna be. Hadley Myers Good luck with that so. Larry LaRusso Luckily, he was a very, very, very nice guy it seems like you're probably pretty well but it's pretty.   Alright guys. Peter Hersh Well, thanks Larry thanks. Hadley Myers guys up as well excellent. Peter Hersh perfect one, take all right bye. Hadley Myers guys see ya.
Sarah Springer, Account Executive, Global Premier Team, JMP Biljana Besker, Account Executive, Global Premier Team, JMP   Do you want to build an analytics culture within your organization? We discuss how to develop an analytics strategy and advocate. There are facets to an analytics culture that require significant change. This change must begin with leadership and advocates within the organization who can set the tone and lead from the front. The analytics advocate must work to promote data as a strategic asset.    This presentation addresses how analytics advocates can facilitate change, overcome resistance to change, promote collaboration, and educate and empower their workforces. They must find additional stakeholders to help execute a unified vision for change within the organization and adopt a plan that can result in an educated workforce to foster a successful analytics program.   As a tool to help upskill your organization’s workforce, this presentation also outlines and highlights unique ways companies can use content from Statistical Thinking for Industrial Problem-Solving (STIPS), an online statistics course available – for free – to anyone interested in building practical skills in using data to solve problems better.     Learn more about being an Analytics Advocate: Next steps for JMP Analytics Advocates How to Become an Effective Champion of analytic | The Evolving Analyst Every Organization Needs a Data analytic Champion | CIO Insight Dedicate an analytic Champion | ITauditSecurity (wordpress.com) Who Needs an analytic Champion? | GoodData Developing an analytic Strategy: The analytic Champion | Elder Research  analytic Assessment: A Blueprint for Effective analytic Programs « Machine Learning Times (predictiveanalyticworld.com) Getting Employees Ready for the New Skill Needs of 2021 and Beyond    Auto-generated transcript...   Speaker Transcript Gail Massari start. Biljana Besker Hi! I'm Biljana Besker, and I'm a JMP Account Executive for Global Premier accounts. My colleague Sarah Springer   and I will introduce you today on how to become an Analytics Advocate in your company, and what resources we have available at JMP to help you get there.   So let's start. What defines an Analytics Advocate? The Analytics Advocate must be an advocate and a change agent who spreads the analytic strategy emphasis and analytic culture   that everyone is comfortable using data-based insights to improve the quality and effectiveness of their decisions. Five characteristics to look for in an Analytics Advocate our.   First credibility.   They are trusted and well respected because of a proven track record of managing difficult projects to successful completion.   And they have empathy because they listen to and address fears and resistance to change as new steps are taken on this unfamiliar path.   And, of course, they are problem solvers. They are willing to roll up their sleeves and work to overcome technical and cultural challenges that arise through each stage of implementation.   They always show commitment. They support the analytic strategy and promote a consistent interpretation of the goals for analytics.   And they are flexible. Data driven decisions require ongoing evaluation of their effectiveness. An Analytics Advocate must recognize when a part of the analytics strategy is not working, and work with all parties to redefine the solution.   What is considered to be the Analytics Advocate's role?   As an Analytics Advocate, you must promote data as a strategic asset, and you have to address resistance and promote collaboration.   And there is a big need to promote the culture of evaluation and improvement and to educate and empower the workforce.   So assets do not necessarily have essential value. And assets are associated with liabilities, so how to promote data as a strategic asset?   Analytics is about having the right information and insight to create better business outcomes.   Business analytics means leaders know where to find the new revenue opportunities, and which product or service offerings are most likely to address the market requirement.   It means the ability to quickly access the right data points to find key performance and revenue indicators in building successful growth strategies, and it means recognizing risks before they become realities.   So how can you address resistance? There are three levels of resistance you must overcome.   The first and most important level is the C-Level resistance. Preparing the technical infrastructure for an effective analytic program biljana Besker may require significant resource investment for an unknown return. This should be addressed, where possible, with a small project requiring minimal infrastructure to secure a quick win with positive expected ROI. Biljana Besker If this is not possible, then show examples where others in the same industry have benefited.   Second, is the department-level resistance. Process owners may resist the perceived effort associated with data governance processes needed to make data cleaning sufficient to support analytics.   The Analytics Advocate must find ways to show how such efforts will result in securing long term benefits to the organization biljana Besker that will turn in rewards and recognition for the department. Again, quick win projects can help; however, the Analytics Advocate should not stop there. Important tasks are best accomplished with a dependable ally with shared interests. Biljana Besker And last but not least, we have the frontline worker resistance. As business process owners, frontline workers are not interested in extra work, as we know, biljana Besker if it's not reflected in the metrics used to access their performance. A smart Analytics Advocate addresses the question "what is in for me?" Biljana Besker Integrating analytics solutions into existing workflows reduces incremental effort and empowers frontline workers to make more informed decisions and improve job performance.   So how to become an effective Advocate of Analytics.   As an analytic analyst you are obviously aware of the power of data analysis. You know that data application of appropriate analysis techniques biljana Besker to a well constructed, meaningful data set can reveal a great deal of useful information, information that can lead to new opportunities, improvements and efficiency,   reduction in costs and other advantages. While many organizations have adopted analytics on a wide scale, several others still employ it only in certain areas, and some, believe it or not, rarely use it at all. Biljana Besker If you often get excited thinking about new ways of applying analytics in your organization and are eager to share your excitement with people you think would benefit of analytics, you are in a good position to become an Analytics Advocate in your company.   So, first focus on the person's greatest challenges and most burdensome tasks.   Everyone has something about their job that is a source of frustration, no matter how much they love what they do. For the person you're working with, a meaningful application of analytics is one that relieves his or her frustration or minimizes it as much as possible.   As long as the application is also important to the overall business, this is a great way to begin to show someone the true value of analytics.   It's also a good idea to start small and then work your way up to bigger projects later so that you're not overwhelmed and just don't run the risk of not being able to deliver.   Second, is incorporate their knowledge and expertise. You may be an expert on the application of analytics, but you are most likely not an expert on every functional area of your organization. Not even the CEO can make that claim.   Therefore, you must rely on the insight of others to help you understand all of the complexity that cannot be contained within the data set, biljana Besker including any legal, ethical, or other considerations that must be taken into account. What's more, you are demonstrating respect for their specific knowledge, which will help build trust and make them more eager to work with you. Biljana Besker Third, learn to speak the language. Again, being able to understand and communicate in the terminology used by the people you're working with will demonstrate that you are willing to meet them on their terms.   It's not a two way street, however. Avoid using analytical and statistical terminology as much as possible. If necessary, practice finding ways to explain difficult or complex concepts in an easy to understand manner. Metaphors often work well for this.   Fourth, publicize your victories and show the credit.   Sorry, and share the credit. Once you have successfully completed the project, be sure to tell your boss. Ask him or her to spread the word throughout the organization and externally, if possible.   But make absolutely sure that the credit is shared with those who assisted you in the project. This will help build attention to the power of analytics within the organization, as well as make those people you've just worked with feel rightfully appreciated and respected.   If you look closely at these four recommendations you'll notice they all have one thing in common. They put the focus on what you can do to help others.   Whether you follow these specific tips or not, as long as you promote the use of analytics as a service that can help a person solve a problem that is important to them, you will go a long way toward fostering a positive attitude toward analytics throughout your organization.   So, but how to become a successful advocate of analytics.   Put user experience first.   For companies, it can be tempting to overlook the role of the end user and focus solely on business outcomes, biljana Besker which is why the Analytics Advocate must ensure that the focus remains on the value and overall experience for end users in addition to the positive business outcomes the company wants to achieve. Biljana Besker To bring us back to our earlier discussion of low adoption and analytics strategy that does not consider the users position and needs.   It's at risk to become a strategy that is technically capable but not valuable enough to keep users engaged.   To mitigate this risk, the Analytics Advocate must be able to explain both the benefit of the analytic strategy to the business but also ensure that the strategy is beneficial for the end users who need to make business decisions.   And push the analytic strategy to evolve. Of course, user and business requirements change over time, so once the strategy is launched, the Analytics Advocate must ensure that the strategy evolves to meet those demands.   Without repetition the strategy runs the risk of outliving its usefulness and driving adoption rates down as a result.   Instead the Analytics Advocate must monitor, manage, and drive the strategy forward to ensure ongoing utility at maximum business value.   Companies who want to introduce an analytic strategy can make themselves much more likely to achieve success by putting biljana Besker that strategy in the hands of someone who can understand end users and push the project to improve experiences and business outcomes. Who understands the analytics represents a journey, not a destination. Biljana Besker Successfully appointing an Analytics Advocate is the first step in this process.   Let me summarize what you just learned. At the most strategic level analytics allows organizations to unlock latent value from the data to gain insights, accomplish business objectives, and improve profits.   While these insights should empower everyone in the organization many organizations resist the cultural changes needed to benefit from an analytics program.   As first step executive leadership must establish and support analytics strategy.   Then designate an Analytics Advocate to engage stakeholders to unify that vision, understand and address pain points, overcome resistance to adoption, and demonstrate the value from analytics through quick win projects.   All organizations can better accomplish their mission of leveraging analytics with a data driven decision process.   Using analytics to achieve a sustainable, competitive advantage and generate significant return on investment begins with a well convinced analytics strategy and roadmap for success that is aligned with and supports the overall business strategy.   And with that said, I would like to hand over to my colleague, Sarah Springer who will show you how JMP can help you to become an analytical advocate in your organization. Thank you. Sarah Springer Hi, I'm Sarah Springer. Biljana, thank you so much for providing that great overview of what makes a good Analytics Advocate within an order to get it within an organization.   I'm gonna look a little more closely at a couple of those areas that Biljana touched on, and we're going to talk about a process and some tangible resources that will assist you and your organization in building a culture of analytics.   So how can JMP support your organization in becoming more analytical, and how can we support an Analytics Advocate? So we've outlined a process here to help you accelerate your organization's analytics growth curve.   So that process is going to go through a couple of steps. So first we're going to talk about how to build a team of data ambassadors. We're going to talk about the best way and some resources to identify key use cases and define success.   We're going to talk about how to establish an efficient data workflow.   How to educate and what resources we have available to educate and upskill yourself and your colleagues.   How to socialize your analytics successes. And then how to democratize data and the process.   So, Biljana touched on this in her presentation, but what is in it for you? If you're an Analytics Advocate within your organization, what can this do for you as an individual?   You can be a vision setter and a change agent throughout your organization. This is an opportunity for you to make a real impact on the lives and the well-being of the people in your organization.   You can be a subject matter expert. If you identify a specific problem or a specific area of need and upskill yourself in that area, you can really be looked at as an SME within your organization.   And gain some recognition for yourself and within your organization and within the JMP community. You'll be gaining credibility.   You'll become a leader in teaching others the skills that you've learned and upskilled on.   And then ultimately we've seen a lot of our analytic champions throughout all of our organizations really have a strong resume and   advance in their careers because of the great work they've been doing at their organizations in building a culture of analytics.   Data is everywhere. Analytics is an important competitive tool.   And it's really past the point of being able to not have analytics embedded in your organization, and so what we've seen is individuals that have been an Analytics Advocate within their organization have   been quite successful, and so this is a real opportunity for you. But what Biljana mentioned is it's really also about helping others and making an impact on your organization and the world around you. And so what's in it for your organization?   Advocates play a key role in demonstrating value and ROI. You're able to pick a project, a real challenge that you or your organization is having and show the true value   of the impact that analytics can play to your organization. Adoption of JMP or an analytical tool or an analytical culture can really help, again, bring the organization into the digital transformation age.   It's at the point, right, where we can no longer not take advantage of all of this data that we have, and so in this role, you have a real chance to make an impact at your organization. And again, impact your organization's bottom line, save your organization money   by improving processes or producing less waste. Other use cases we've seen are securing time to market, and you're really helping your company to stay competitive.   So the first part of the process as you're thinking, "how do I make an impact at my organization" is to think about, as Biljana mentioned,   "Who can come along with me in this journey? Who else is feeling the same pains? Who else can benefit from a strong analytic culture?"   So looking beside you and to other departments but then also up. Who at the executive level,   what executive sponsor might be interested in some of these pain points I'm having? How can I get stakeholders to support this movement? How can we get   buy in early? And so the goal is to find reliable, passionate, accountable people that are maybe having some similar challenges as you to walk through this journey   with you and to help, you know as Biljana mentioned, show the value and prove to leadership and prove to stakeholders that this work is valuable and deserves attention and investment.   Once you have your colleagues, and you have buy in, and you have your team, what we want to do is the next step of the process is really looking at identifying key use cases   and defining success. So what success looks like is an important part of defining an analytic strategy. Thinking about   some common use cases within your organization. Maybe you have too much data; maybe there's data not being used; too many systems to get the job done; not a good way to share decisions;   maybe there's a lot of wasted time, you know, being able to sift through all of that so, so figuring out what is my organization's challenge.   What would success look like, how can I move the needle?   As Biljana mentioned, starting small is important. We want to think, maybe, of something that's not a huge undertaking, but maybe it has a broad impact.   So as you're thinking of the right place to start these are some things to consider. Some great resources that I would recommend as you're thinking through this if you want to look at your organization's annual report,   or 10-K if you're a public organization, or maybe there are some internal documents that are outlining for the year what your risk factors are as an organization. You can get a strong overview of some of the concerns that executive leadership has about   maybe some of the risks that they could approach this year. Often we see some risks across R&D and manufacturing that might be very relevant   to being able to solve that problem with analytics. Maybe it's time to market; maybe it's improving   any sort of defects in the manufacturing process, right, those are things outlined in your organization's 10-K   or your annual report, and that could be a really great win for you, right, if you can pull some folks together to want to solve one of those problems or improve one of those processes.   The other resource I would recommend taking a look at is   going to the JMP website and looking at our customer success stories. There's a whole library across industry and challenge that could really get your wheels turning and give you some great ideas about some possible use cases and what success might look like.   So, once you have your team, and once you have the goal next you want to think about how to create and establish an efficient data workflow. So, it's important that you're able,   in order to do great analysis, you want to have   good data access, you want to be able to streamline that process, how are you pulling analyzing and sharing that data, how are you getting the right information to the right groups.   Where is your data? Is it accessible? Is there anything you can automate? Can you make anything easier? Can you use JMP Live to share information? So there's a lot of things to take a look at. Tangible resources for this include conversations with IT. Maybe you can look at   possible scripting or automation within JMP or within your analytical tools to really make an impact to make this as easy as possible so that it's in the hands of the right people who can solve these real problems and contribute to your success and the success of your organization.   So next we're going to talk about a step that is very close to my heart--training and upskilling colleagues. I spent some years at SAS within SAS Education helping JMP users do just this.   And so I wanted to touch on, you know, once you have your team, you define your goals, you've got your data access in a good spot, we want to talk about how do we give   our team and employees and users the tools and the knowledge to execute this plan.   What we're finding; there was a survey done by HR Dive's studio ID and SAS that was conducted in October of 2020, and they found that this is a huge need. Eighty-eight percent of managers said they believe their employees' development plans needed to change for 2021. A lot of this was coming out of sarah Springer us kind of shifting into a pandemic world and people working remotely, and folks are really asking for training and for development and for help. Out of the survey, 50% of managers said employees needed more upskilling, more reskilling, and more cross-killing and 41% of the   employees themselves said the same thing, and that when considering the types of skills employees should focus on employees needed more technical skills. Sarah Springer They really, really want to build their skill set, and so as you're thinking about how to, you know, build an analytics culture, training and upskilling is really important, and people are wanting those more technical skills so that they can make a contribution in this age of digital transformation.   So the survey also brought to light five major learning and development trends. So just wanted to highlight this as something to be thinking about as you think about your strategy.   The trends were that companies are now expected to take on more responsibility for employees and society and making sure that they're getting what they need,   that people are being taken care of. And companies need to match, another thing was that companies need to match their technological investment with the learning and development of their people.   Learning and development are much more universal, much more universal, and it's really a strong recruiting and retention skills, and again these hard skills are really in demand and so   as you're thinking about your strategy it's so important to think about how can I help my colleagues, my organization get the right knowledge and training in their hands so that they can really be impactful with all of this data that we have.   So, here at JMP we have a couple of resources I want to point out that can be really powerful to help upskill your organization. The first that we're going to touch on is the Statistical Knowledge Portal.   Then we're going to take a deeper dive into STIPS, which is our statistics course. It's a free online course called Statistical Thinking for Industrial Problem Solving.   It is award winning, and it is self-paced and a wonderful resource that many of our customers are using to provide analytical development to their employees.   And then, finally, we do have some formal SAS training and some resources that I do want to point out as we go through this process.   So the Statistical Knowledge Portal is a great site, I've put the link there for you,   that has information in all of these different areas that I've listed. So, it's a great way, if you have somebody that needs to know a very specific skill,   they can go on here, they can pull some resources and skill up fairly quickly.   It's a great way to get them started, to get their feet wet, to develop some knowledge, to get some tips and tricks. I would highly recommend spending some time on this website.   And then you as the Analytics Advocate can really help,   you know, drive the person to what skill they might need based on your project, right, so I think there's a lot of collaboration that can happen here.   But I do want to point out that this is a phenomenal free resource on the JMP website and has a lot of great statistical information for you.   The next one I want to point out is STIPS. So, all of these modules over on the right are self-paced. They're deep dives into the topic area, there are hands on exercises,   and it's really going to help you get up to speed and understand that statistical concept, so as you're   working on your project as you're working towards your goal, think about different areas of this course that might be helpful.There is a great overview module at the beginning as well, that I would recommend, that talks about   what different processes you can use to begin to be thinking statistically throughout your organization.   So it starts at the beginning, and then it goes all the way down to advanced modeling. So it can really meet you where you are, and we'll talk a little bit more about this course.   So what I love about this course is this is really something that JMP has put out because they want folks to be strong in analytics. They want folks to understand statistics and to understand the why behind what we're doing.   And so we've put out some additional resources to help companies upskill their teams, so you can take this course in a self-paced format, but we've had many, many customers   want to use these materials in a different way. We have customers doing lunch and learns throughout their organization, having sessions where they'll take a specific concept from the course and have discussion groups.   We have professors and universities using STIPS, or some of this material, as prerequisites or even within their statistics courses.   And so what we've done is we've made teaching materials that has put some of this material into PowerPoint slides for you to use sarah Springer at your organization for some internal training, and we wanted to make that easy and accessible for you. So, going to jmp.com/statisticalthinking, there's an online form on the right hand side of the page Sarah Springer where you can fill that form out and get these materials to use to help upskill your team.   We also have student activity reports available, so when somebody takes this course, they come to the SAS website,   and they take it within the virtual learning environment, so if you do want to use this within your organization as part of your internal training, or perhaps   as some sort of prerequisite for another training, you may ask for student activity reports   in order to have an understanding of who's completed what. There are some opt in requirements there to make sure we're complying with privacy laws, but I do want to let you know that that is available if you'd like to use STIPS in that way.   And we do have some other requests here and there that don't necessarily fit in these two buckets, so if you require more customization we do also encourage you to talk to your account teams about that.   There is also on the right hand side of the Statistical Thinking page on the JMP website there's an option   to register a form for this third bucket so someone can reach out to you and talk to you a little bit about what else is available.   But I just wanted to provide a quick overview on this. These are really tangible resources that can be leveraged within your organization to creatively upskill your team.   And then, finally, the third training resource I wanted to touch on is formal SAS training. SAS has incredibly   strong, relevant trainings, hands on trainings that provide a real great depth and understanding of different concepts.   So there's lots of different formats for individuals, large groups, small groups, and I've put the link up here, so you can go check out those courses as well. It's a really great way to upskill your team and then make sure they have the right tools.   And if you don't know where to start, I definitely wanted to highlight a tool that SAS Education offers called the Learning Needs Assessment.   It's a data driven survey that can be distributed to your team according to learning area, you know, of what major areas that we often see our customers needing.   Maybe the the major courses that SAS Education offers around Design of Experiments or scripting or ANOVA and Regression, right, these are great resources.   And if you don't know where to start, we want to be able to survey your team, identify what their preferred learning style is, identify their competency in these areas   and then put it in a report that's easy for you and managers and executive leadership to understand.   And then we would work together with you to make great recommendations those, and recommendations could be use of STIPS,   it could be use of formal training, it could be complimentary resources from the Statistical Knowledge Portal, but it helps give you an idea of where to start if you're not quite sure according to your project and your goals what your skill gaps are.   Sometimes you need a little help identifying those so that's what this is for.   So once you've got your team upskilled and once your team is trained in the areas   and you're working on your project, it's time to document those successes, right? Biljana touches this on this as well. You want to be able to document that as a proof of concept, show the value to your organization,   continue to get that commitment and investment in your work, your team's work, in the power of analytics. Continue to help your organization move towards digital transformation, and so   being able to document these successes are important. A couple of resources that I've found helpful to do this, I think, the main one is our customer success program. We do have a great program I mentioned earlier, where   you can get these stories published on the website, but we've also helped some organizations with internal stories.   You know, ask your account team for help. We want to help you document these successes, and so we can certainly help you do that, and then often we can help you do that if you want to tell a story, and JMP we would love to help you show the impact that you've made for your organization.   And then finally being able to democratize data and the analytic process. So this is one of those steps. How can we take what you've done as a group   and then spread this further throughout your organization.   You know, once you've done this and you've documented your successes, I'm sure you're seen as effective as a leader.   You're probably well respected, so now you get the opportunity to make even broader of an impact on   your colleagues and and bring them along with you in that success and make an impact on your organization. So what you can do from here is really empower more people to be more data driven.   And, you know, I think using things like I mentioned with some of the STIPS tools. Maybe you're leading lunch and learns, maybe you're creating a user group,   maybe you're doing an internal newsletter about the power of analytics, maybe you're working with the JMP team to do doctor's day in sessions, or   you know, different sessions around the features that have been very helpful to you with your project, so this is your opportunity to help others.   Help others at your organization make an impact, and help your organization shift to be more data driven in today's digitally transformed world.   So I want to leave you with some tangible next steps. We've gone through a process   of how to build a culture of analytics, and as a next step, JMP has a great resource jmp.com/advocate where you can go and learn more   about each of the steps that we've outlined today and what resources are available to you that correspond to each step, so today's been a great overview,   but if you do want to take a tangible step to move towards an analytical culture within your organization, I would highly recommend that you go here   and check it out. And then don't hesitate to reach out to your account team. We're all here to help, and we want to support you in making an impact at your organization and around the world.   Here are sources for your viewing pleasure and I just want to thank you very much for your time and attention today Be well.
Anderson Mayfield, Dr. (assistant scientist), University of Miami   Coral reefs around the globe are threatened by a plethora of anthropogenic stressors, most notably climate change. There has consequently been an urgent push to better understand the fundamental biology of key coral reef inhabitants, understand how these organisms will respond to future changes in their environments, and make data-driven predictions as to the relative resilience of disparate coral populations. In this talk, l showcase a complex, "molecules-to-ocean basins" data set featuring molecular (e.g., proteins), physiological (e.g., growth), environmental (e.g., coral abundance), and oceanographic (e.g., temperature) data to showcase how I have been using JMP Pro 15 and (more recently) JMP Pro 16 to model coral behavior and make predictions as to which colonies/reefs may be at most risk. I also examine which reefs are expected to demonstrate resilience to global climate change and disease. Through this update to my Discovery Summit Americas 2020 presentation, I hope to, more generally, show progress in forecasting coral health and resilience using the ever-growing data sets and analytical power now at our disposal.    In addition to the recorded presentation itself, I am uploading the following files: the three JMP tables referenced in my talk (each with embedded scripts) and the Powerpoint slide deck. I actually passed over some "hidden" slides in the presentation that provide additional context, as well as additional approaches employed. Of note, I would like to extend a "shout-out" to @DiedrichSchmidt for sharing and customizing his excellent machine learning GUI, which I have used extensively to create thousands of neural net-based models with my coral health data.        Auto-generated transcript...   Transcript Hi everyone, my name is Anderson Mayfield and I'm an assistant scientist at the University of Miami's Cooperative Institute Marine and Atmospheric Studies. I actually work across the street from University of Miami's marine lab NOAA's Atlantic Oceanographic and Meteorological Laboratory as part of their coral program. Today I'm going to be talking to you about some of the research I've been working on over pretty much my entire career, which has been trying to develop tools for predicting the fate of reef corals. So this is actually an update, or a continuation, from what I discussed last year, my Discovery Summit talk, but I know a lot of you will maybe not heard that or even if you did you may not remember everything, it's totally fine I'm going to give you I'm going to set the stage for the research that I'm doing. Talking about some problems facing coral reefs, give a little bit of a recap of where I was at last year. Really excitingly, not probably two months after last year's Discovery Summit talk, I made a kind of a big breakthrough in my coral predictive modeling progress and this was largely in part due to some of the new features and JMP Pro, so I'm going to share these kind of hot off the press data today. And that's getting to this idea of can we predict the fate of individual coral colonies? So predicting the fate of a coral reef, so the entire assemblage of coral, this is actually easier because coral reefs are basically in bad shape across the globe due to rising sea water temperatures. So corals have dinoflagellate algae living inside of their cells, they photosynthesize like a plant and they translocate the energy that they produce to their coral host. Problem is when the water gets too hot this messes with their ability to photosynthesize properly. Corals either digest them or kick them out, the end result is the same corals effectively starve to death. And this is a problem across the entire globe so predicting the fate of a coral reef, we pretty much already know these these are ecosystems in decline. But on an individual level you do see heterogeneity, there's coral species that fare better than others, there's genotypes within species that do better than others - that are bleaching resilient when others are bleaching susceptible. So something I've been interested in my whole career is what drives this resilience and is it something that we can actually predict? So I liken this to being kind of a coral actuary. Can can you send me a biopsy from your favorite coral and I can do some analyses and say hey, you know this coral has six months to live, this one is going to be fine for decades, this is this is kind of what I've been trying to do. Because the way we currently assess coral health is retroactively. We go out there and do our surveys and we basically quantify the amount of dead coral. So these are important data that need to be collected, but you didn't really do these corals any favors by this sort of approach. This would be analogous to a doctor telling his patient who had just had a heart attack, hey you had really high blood pressure, you know that patient would have liked to have known that weeks, months years beforehand, so they could have done something proactive - medicine, lifestyle change, you know better diet and exercise. So you know I think it's unacceptable to assess the health of humans in that retroactive fashion, so why can't we have this kind of more proactive approach for corals as well? So there are predictive tools out there, and some of them are actually developed by NOAA, actually most of them are developed by NOAA, my employer, that are trying to make projections or predictions about coral health. They're exclusively using temperature because there's just this well this well known well studied link between high temperatures and coral bleaching so it makes sense to hone in on temperature as a predictor. So this is looking at, actually United Nations a report that came out this year but a lot of the authors are working at NOAA. This is looking at the year of onset of what they call annual severe bleaching. So this is the year, at which bleaching starts to happen every single summer. So corals can recover from bleaching, they can acquire new symbionts, they can eat, they can weather the storm, to a certain extent, but when bleaching starts happening every single year, they lose the ability to recover, so this is why the annual severe bleaching year is kind of is what I'm calling on at the point of no return. So this is pretty grim because you can see some of these years in red, 2015, 2020, it's obviously already passed, and this is looking at a map of South Florida, so this is where our field sites are. But we know this is not actually 100% true. Yes, there are reefs within these pixels that are already bleaching every year but there's plenty of others that are not. So one of my goals is to improve the spatial and temporal resolution of these temperature-centric models, because these temperature-centric models don't consider what's there, they don't even consider whether or not there's corals there at all. It's just looking at temperature patterns. So they don't factor in whether or not there's resilient species there, whether there's resilient genotypes, but then we need to go out, they're not grounded in truth at all, frankly. We need to go out there and do our ecological surveys we're looking at coral abundance, biomass, biodiversity, what kinds of coral enemies are out there, certain types of macroalgae and coralavores, but even that I feel is not enough, because you can have a reef that is absolutely carpeted with corals like this one, you see here in this image. An ecologist might even say hey this looks really healthy it's high abundance of coral, the water looks pretty clear, there's not a lot of algae, you know to even the trained eye, you might guess that this is a healthy reef but if I know personally as a coral physiologist that this is a low resilience coral reef because these species are weedy they're the ones that don't keep a lot of energy in the tanks for when the seawater rises, so this this inherent disconnect between abundance of corals and health. But if you think about it, this is not that surprising. We don't manage human health this way. I don't go around downtown Miami where I live, and say hey this neighborhood has a million people, it's two times healthier than this neighborhood over here with half a million people, I mean it might be to an economist healthier. But if you actually look at the health of the residents, there may not be a correlation. You might say, hey this bigger city has you know better health care than a small town. So in certain situations there might be some link between population density and human health, but you can find just as many places, where that's not the case, you know. Higher population density areas tend to have more crime, for instance, so it kind of blows my mind that this is still the way we manage that or try to make conjectures about the health of marine organisms is simply by counting what's out there. I think you know you wouldn't do this with with people, and I think we can do better. So what I really want to do is not just look at temperature. Temperature is going to be critical for understanding for help, but not in isolation. Similarly, ecological factors, the organismal abundance, the biodiversity is also going to be critical assess. It it's not going to tell you about persistence and longevity in and of itself. What I want to do is, I want to factor these things in but also consider the physiological and the actual health of organisms in these ecosystems so that is a very complex, convoluted model and I haven't actually worked out how it's going to be presented, or how it's going to work right at the moment, but I have some ideas. Today what I really want to focus on is what I've encircled here so simply trying to use the physiological data to make a predictive model with the capacity to forecast whether a coral is going to resist bleaching or be susceptible. And for these kind of predictive analyses I gravitate towards molecular biology, because you should see subcellular shifts in behavior that are reflective of some kind of aberrant physiology before you notice something wrong with the naked eye. So getting back to that heart attack analogy you're going to document high cholesterol levels or high blood pressure in an individual weeks, months, maybe even years before they're likely to have cardiac arrest and that's why when you go to the doctor they draw blood. There's we have these well developed biomarkers like blood sugar and cholesterol for making conjectures about your your future health. They're not going to be able to tell you, you know the day you're going to kick the bucket, but these are biomarkers and analytes that have high predictive power in terms of your future health prospectus. So cholesterol is not a particularly good biomarker for coral, but other things may be, and so at NOAA we take this multi- we call it multi-omics approach, where we look at all the different molecules synthesized by our organisms of interest, corals in my case, to see if we can figure out which ones might be reflective of a state of stress or a state of resilience. I actually started off in the gene expression world, so the messenger RNAs. But recently I kind of switched over to proteomic research just because proteins are actually carrying out the work in the cells. So a lot of times you don't see a strong correlation between the mRNA expression and the protein levels. mRNAs could still be really good health diagnostic biomarkers but they're not going to be too useful for mechanism. As a physiologist I want to have my cake and eat it too, so I want to be able to know what's happening in my coral cells, how are the coral cells responding to high temperatures, which proteins are involved in thermo acclimation. These are the kinds of questions that can be addressed by what we call proteomics, which is the assemblage of all the proteins in a sample. So you get the mechanistic data, but you also can have these proteins serve as health informative biomarkers, so this is kind of my slide where I try to condense 20 years of research into a single slide just to get everybody up to speed so. Determining the optimal suite of analytes. So I just talked about the proteins. Do I think proteins are the end all be all and they'll never be a better thing to measure in a coral? And that's why I put the check here - absolutely not. And you'll see later in the talk that they're actually issues with with using proteins to make health inferences. There's probably something better out there, so this is still a work in progress. Unfortunately corals are non model organisms, so for all these things, I wanted to measure I had to spend years developing the methodology myself. Molecular benchwork, in particular. You want to use any sort of analyte, be it a gene, a protein, a lipid, as a biomarker you're going to need to accommodate the natural sources of variation. And these are things like the light cycle. So because corals have photosynthetic dinoflagellates inside of their cells they show huge swings in physiology across the light-dark cycle, across the tidal cycle, over seasons, these are kind of the non-sexy, but kind necessary evil projects you need to do to ultimately get towards modeling coral health. Once you have a grasp of how the corals are getting on their day to day lives, then you do these environmental challenge studies where you expose corals to stressors or or in the laboratory or you're looking at stressor regimes across environmental gradients in the field. And this is actually what the coral field as a whole has done the best of, there's thousands of papers out there on coral environmental physiology and I'm going to talk a little bit about one such example of that later in the talk. But where we're stuck and were you don't see any work right now is this idea I've been getting at of using coral health data to actually generate predictive models of coral stress susceptibility so that's what I'm trying to make inroads in. And I think this is because most physiologists and molecular biologists are much more familiar with the kind of descriptive side of things, what happens in the organism as it's dying. This is not to belittle this sort of research, it's pretty much what made my career to date. It's easier to publish these kinds of papers. Stick coral reef tank tank temperature up look at the protein levels. I've published 10 papers on that, and you do need to do this type of work to get at this predictive approach. So but I do, that being said, I do want to kind of make this shift for me personally from being kind of this undertaker who's effectively writing coral obituaries, essentially what these papers are, they are generating useful data but they're not doing the corals any good. And we take the data that we got from those inherently descriptive projects and use them in a predictive capacity, try to figure out which corals are going to be the climate change winners and which ones are going to be the losers. So going from being an undertaker to this, you know vet or the actuary that I mentioned earlier. And I don't want to give you guys the impression that I'm you know this amazing modeler. This is something I really just have been teaching myself and picked up on the fly, but I attribute a lot of the success in this and making this kind of seamless transition from doing more descriptive stuff to predictive stuff this is driven by by JMP because because my familiarity with the Fit Model platform, for instance, really made it easier for me to kind of make this description to prediction transition so, for instance, what I would have done in my entire career up until about a year or two ago is the example on the left. You see, these coral proteins in the y box. These are the esoteric accession numbers you don't need to pay attention to what they mean, they're just know they're proteins. Construct model effects showing the things that I would be interested in temperature, genotype of the coral, when they were sampled, things of that nature, and you can publish a nice paper looking at these sorts of data sets. Now what I'm doing is I'm basically flipping that on its head I'm taking all those proteins and moving them down into the construct model box, alongside the experimental and/or environmental data but what's going to be in the y is going to be what I'm calling here the coral bleaching susceptibility. I give it another name later, but the idea is the same. It's the likelihood that a coral is going to bleach. So I think JMP has been really instrumental in kind of helping me make this this transition from purely descriptive stuff to kind of getting my feet wet and predictive biology. So I do want to recap a little bit of the descriptive stuff and feature some of some of my some of the tools within the JMP Pro package in particular that I'm a big fan of. So I'm going to play around with a little data set we have from this coral here - orbicella faveolata - it's one of the one of the more resilient corals in the Florida Keys so climate change effects are so bad to where you know we don't have the luxury of necessarily working with all the coral so you basically got to work with the ones that are already inherently a little bit stronger. So we took corals from different reefs, exposed them to different temperatures, different amounts of time and looked at their protein concentrations, because we need to have kind of an idea of what biomarkers might be, we might might be useful in model building. How does temperature affect the proteome of this coral, all manner of descriptive things that you might want to address, so this is very similar to what I would have shown before. This is my model box, I do a lot of univariate stuff where I'm trying to find you know which proteins are most affected by temperature, by the reef site, things of that nature. I do a lot of multivariate work - principal component analysis, multi dimensional scaling. Looking for relationships amongst samples, looking at multivariate treatment effects and we're going to be looking at some of this here in a second. So last year, what I showed and I published this paper since then um, this is a multi dimensional scaling plot made in the JMP Pro 16 multivariate platform - actually what this would have been 15 but all I want to show you here don't get too bogged down in the details. And I apologize if you're colorblind. But the green, which are the bottom three samples on the right, these are control corals, so they're at the control temperature. The three at the top, are at 32 degrees, which is pretty much a death sentence for most corals. And you can see after several weeks of exposure, their proteomes, so their protein profiles, start to become more stable and that's important. You need to know that there are effects of temperature on the proteome before you do any kind of predictive model. Those data were looking at presence absence data. So whether or not a protein was there or not and it was about seven or 800 proteins. Since then, the methodology improved, so this is a good thing about working in molecular biology, the methods are always getting better. So what I did I took the same samples, but rather than measure seven or 800 proteins as presence absence I took a subset of 86. But the data are now fully quantitative. So I have you know almost tenfold fewer proteins, but there's much more resolution with respect to the concentration, so the data set I'll be playing around with now is only featuring 86 proteins. So this is a typical distribution of what these proteins look like they're never normally distributed they're always skewed so this limits me in terms of using exclusively parametric approaches, but that's not a problem with JMP. So let's go over into a JMP table. So this is the first one, I want to show you. You've basically got the two features of this table I really want you to focus on are its wide nature. So proteomics is expensive. $200 a sample, my budget's limited, in this particular experiment I only had 20 samples. But I have 86 proteins. What that means is can't do typical multivariate ANOVA to look for treatment effects, but that doesn't mean that I can't do any sort of useful multivariate analysis. One clever approach I got as a suggestion from JMP developers is to go do my traditional multidimensional scaling. So here's my 86 proteins that then log to transformed I'll put them up here. And I know from having done this before that four dimensions is going to give me a pretty good solution. So here's the multidimensional scaling plot, these are the various fates of my sample, so healthy controls, actively bleaching, we'll get more into that later. But I want to do here is look down at stress, I see okay it's getting a little bit high usually like to be below point one. I could get the stress even lower by increasing the number of dimensions, but I think four dimensions it's going to be okay, so what I'm going to do is I'm actually going to save these coordinates. I could save the similarity matrix as well because I'm really more interested in the similarity amongst these samples so I'm going to take these and they're now going to be my (???), so now, I have a situation where I have fewer y's than samples, so I can do a traditional multivariate ANOVA. And my model effect, I want to have this done. CHD lab, what does that mean? That's the coral health designation so that's whether or not the coral was resilient or stressed. So whoops. So these this is BLR, that means bleaching resistant, I'll use this term throughout the talk. BLS, bleaching susceptible, and this is looking at a plot of these four dimensions for these two groupings of samples so I'm going to run a multivariate ANOVA. And I see a marginal effect of health state, and this is again something you would want to know. If these bleaching resilient corals have a significantly different proteome than the bleaching sensitive corals. So it's not strongly significant is kind of on the border, but with an environmental data set like this, I would happily report this in a publication. Another thing you can do that's probably the better approach is to take those same four coordinates The MANOVA is going to get tricky with small data set, a small wide data set like this, where you've got a bunch of different environmental parameters, so instead I'm actually going to use Partial Least Squares platform that I'm a big fan of. And I'm going to take my experimental data, these are things like where the coral was from, the temperature it was exposed to, the genotype and I'm going to do a response surface. And then what's running. I'm actually going to bump the KFold down to four just because the data set is so small. And this is not exactly the output that I wanted, I think I must have included something a little bit different but that's Okay, so I've saved the script that actually want to show you. Is here and really I'm not even too concerned with the number of factors. You could see here, I chose three. Really, all I want to show you is this, it looks very chaotic and messy, but this is known as the correlation loading plot, and I think this is really an unsung underutilized tool in the entire JMP Pro package. Just because of how much you can gain from this figure, so this is showing where my samples fall out with respect to one another, then I can layer on my environmental factors, genotype, site temperature time. In this case, I'm using the coordinates so the y's are going to be more difficult to interpret, but I could have done this with the raw data as well, and then you could really start to look at how your analytes, your environmental variables and your samples all kind of mesh and I think this is a really powerful exploratory tool that that may be less familiar to some of you out there. So that's one of my favorite multivariate descriptive tools, I like to use in the JMP Pro package. So while, that was all experimental data, so while we had those experiments going on these corals they're my patients. They're sessile. I know exactly where to find them go out there on the reef. I can sample them in different seasons. I'm taking tiny little biopsies so it's not going to be perturbing their their health to any great extent. So I have these coral samples through space and time and I know when they bleach, where they bleach, which ones bleached and which ones didn't. So now, I can go back backwards in time through the archive and hindcast looking at corals that haven't bleached yet and trying to see if I can make predictions about which ones did and which ones didn't. So just to give you a little bit more info on the sites they're down here in the upper Florida keys, we have two inshore sites, The Rocks and Cheeca Rocks, two offshore sites Little Conch and Crocker Reef. And it's important to know that the inshore reefs tend to be much more robust you still see big healthy corals not everywhere, but some places. Offshore you're dealing with these little press, so we have a nice gradient of resilience in situ. This is going to be important for model building you wouldn't want to build or test your models using exclusively strong corals or exclusively the weak ones, you're going to want to have a mix of both. This is just a quick Graph Builder plot I put together, and this is looking at color scores, so you want to be a five. Five basically means you're totally healthy. Zero means you're stark white and you're about to die. In reality three to four is getting pretty stressful. So you can see here in August this is in the middle of a bleaching event, we have some pretty market bleaching at these four sites, even the more resilient inshore sites were bleaching to some extent. But then, by October they've recovered and then December they've recovered, but you see this kind of anomalous behavior and I think this is driven by a disease. So this is getting back to kind of my my jargony coral health designation that I mentioned earlier, so. This is basically, you know I want to have this ultimately or eventually be maybe a continuous variable, maybe a one through 10. But right now we're trying to predict one of essentially three categories. You're either bleaching resilient or resistant, you're bleaching susceptible or you're intermediate. So the bleaching resistance is going to be this green line at the top, your color score doesn't really change over the course of the bleaching event. This is another Graph Builder plot, so this is looking at the temperature on the bottom half. So an intermediate coral is going to bleach you know a little bit, but then it recovers. The bleaching susceptible one is either going to bleach markedly and recover or, in some cases, and may not recover it all. So these are the three essentially the phenotypes that are going to be the y's and my model. And this is convoluted by design don't worry, this is basically showing the different ways, you could ultimately go through the model building process. So on the left you'll see which data are you going to use to train and validate the model. The model itself, what are you going to use test the model and then what you're ultimately trying to predict. So if you're just using lab coral data to make a predictive model in which you're testing it with more lab world data, that's going to give you the power to predict coral health in the lab, but that's not really what we're after. I mean that's okay for publication, but we want to know, we want to be able to predict what's going on in the field. But then you run into this issue of how, how do I feel about using lab data to make a predictive model that's attempting to forecast what's going on in field, you know that's always going to be a precarious issue of using this lab data to make predictions about this less constrained field involved. But we're going to try it anyway. So again, this is using this colony health designation as the y. The predictors, as I mentioned earlier, are ultimately going to include more than just the proteins, but for the sake of simplicity we're going to use the proteins today. And this is using JMP Pro 16.1 I have X G boost add-in that I really like and I really recommend. This is more of a note for myself, but I do want to mention one important thing, and it gets to one big drawback of proteomics and it's stochastic in its nature. So remember I sequenced 86 proteins in my laboratory samples So those are the ones I'd like to use and my predictive model. When I went out there and measured proteomes and my field samples I didn't get those 86 proteins I got a completely different set of proteins. So, then, I had to go and try to figure out which proteins were actually basically found and all my samples, so this dramatically whittled down. The number of proteins that I could use, it actually went from 86 to 31 all the way down to five proteins and that worried me because I really wasn't sure if five proteins would be enough to give me any kind of predictive power but let's see. So that data set is here so basically I took a subset of 31 proteins that were found in all my field samples, but what I want to do today is, I really only want to focus on those proteins from the corals that I sampled in July 2019. I only want to know these coral samples because this is before the bleaching event. I want to know is there something I can detect in these corals before they bleach So, then, that dropped my 31 down to five so I'm going to try to make a predictive model with these five proteins. And I'm going to use my favorite new feature of JMP Pro which is known as model screening, because this is going to allow me to test multiple different models in parallel. So I'm going to put my coral health designation for the lab here. I'm going to put my five proteins here. I'm going to give all these different options a chance you know what I actually don't have the XG boost installed on this computer but that's Okay, I think we can, will still be able to find a decent model. I'm going to use a Kfold cross validation of five. I do want some of these options here. And let's see what we get. This is going to take a moment to run beca8se it's going to check 20 different models in parallel, using all these different factorial combinations and we have some diagnostic data here, but I want to get right down to the validation sets. So they're essentially ranked here, so we have a neural boosted model that seems to be performing well. Because I'm concerned with accuracy I tend to gravitate towards the misclassification rate. We have a generalized regression model, let's check that out so I'll select it. It's looking pretty good. I'll run it. You see all the diagnostics here, I do want to take a little bit of time to look at the Profiler because this is going to tell me which proteins are contributing the most. So what I had said earlier, I apologize, I need to reset it now. I usually have the desirability to where one of my treatments is maximized, be it the bleaching susceptible or the bleaching resilient. Then what I can do is, I can say hey show me what it takes to be a bleaching resilient coral, so I'm actually going to minimize the bleaching sensitive and Well, what it's only showing me show me one option. I think, you know what, it's already it's already been been set to that so that's that's okay So it's showing me in this case how the proportion of samples fall out with respect to bleaching susceptibility as I changed the levels of these proteins, and this could be useful for things like genetic engineering, if you want to try to find the proteins that are going to be most involved in the resilience of of coral. You're going to want to play around with the Profiler but I'm more interested in today is actually the predictions itself, so what I'm going to do is I'm going to publish the prediction formula here. You'll notice one of the proteins dropped off, so one of them was not deemed to be useful in the actual model building. And this is something I've only ever done in the last two or three days, believe it or not. So I've got my July corals in this data table, and I want to compare. I want to see the predictions it makes from that data table, so let's go over and see what it guessed based on that generalized regression model. And I'm just going to quickly run through here and tell you, because it's only 12 samples, whether or not the guess was right or wrong, so this is right. Sorry the r doesn't work on this computer, believe it or not, this is right. Right. Right. Yes, there's definitely an easier way to do this, but because this is wrong. Right, so now it works again right. Right. Right and right. So in this subset of 12 samples using this for protein generalized regression model. It got the bleaching likelihood right in 11 out of the 12 samples, which is actually really cool, I mean I've run the simulation various ways over and over again that's actually better than what I've typically been batting so that's whatever 92 93% so that is really exciting, especially because that's only for proteins. So let's, already talked about that, so yeah so my average from doing this sort of simulation is about 80 to 90% accuracy. Which for me is incredibly exciting. To a physician that would actually be terrible. If you're only right you know, in terms of the prospect of a cardiac arrest event and 80% of your patients, that would be a huge failure, but I think given that this is kind of new for corals, I'm Okay, with the accuracy of 80 to 90%. And this what I didn't mention that subset of five proteins, which then got whittled down to four, that was only looking at the host coral. I've removed the symbiont proteins, because I I literally just got these data, a few weeks ago. When I add those symbiont proteins in, I think the accuracy is only going to go up and also remember this. I was using lab data to make predictions about field coral behavior. So you wouldn't expect it to be 100% accurate. What I should have done and what I'm going to do, probably within a few hours of finishing this talk, I've got all the field data. I've got all the lab data. I can go ahead and use both of those data sets in the training and validation. Make a model in which I predict field coral behavior. And I'm confident that accuracy with that approach is only going to go up. You could argue, why would you even want to use the lab coral data at all when you've got this field data set? And you know, I think that could be a very valid point. From a biological side you know the actual nature of these proteins is incredibly interesting to me and I just uncovered what they were last night. Running out of time, so I'm not going to go into them, but suffice to say that's going to be incredibly interesting as well that we're able to mine out. One of them is evolved and prey capture, which I think can be super cool for for trying to develop a mechanism. This is, these are really small data sets these are dozens of samples. So do I think I made the world's best coral stress test? No, this is, this is very much a work in progress and there's other methodological issues that need to be worked out, but really what I ultimately want to do is develop this kind of a stress test so we can go out there and do what I'm calling coral reef triage. I could take my little samples do my proteomic or other molecular analyses, input the data into JMP, spits out these resilience guesses and I tell manager hey this reef over here looks to be in really good shape and let it be for now, you know, keep an eye on it. This one over here all the corals were deemed bleaching susceptible by my model. If there's anything you could do to mitigate or to give those corals a chance - clean up the water quality, curb overfishing - those are the ones where I think you should triage your effort. So really that's what I want to be able to do with these kinds of data sets. So of course the data sets need to grow. I do want to compare them to this temperature based models that are out there, although the scale is inherently different. And I really just want to be able to either have this kind of a visual or just a map where I just say you know here here are the bad spots here the good spots. I'm going to do all the dirty work, I want to be able to distill this into something that a manager or lay person could easily digest and I think once the kinks have been worked out and once we have a little bit you know bigger data sets, incorporate more coral species, extend the geographic range, we really could be off and running into using taking this coral health predictive modeling approach into you know, as a decision-making tool. So, not just for monitoring, but for looking for resilient genotypes that might be useful for coral farming for restoration. You could also use this approach for tracking the success of mitigation projects and things of that nature, so I think there's a lot of potential for this kind of coral health predictive modeling approach for kind of this more proactive marine management, and I hope this is something that we can we can continue to grow at NOAA, University of Miami in the coming years, especially as, as these ecosystems become ever more imperiled. So with that I'd be happy to take any questions. I'm again I want to mention on I'm not a modeler by training, so I definitely welcome any suggestions. Don't take any of this as being dogma, you know if there's a egregious misinterpretations of data or anything like that, you won't hurt my feelings and I definitely will appreciate any questions or feedback. So with that I'll end my talk and be happy to hear from you. Thanks.
Mia Stephens, JMP Principal Product Manager, SAS   Predictive modeling is all about finding the model, or combination of models, that most accurately predicts the outcome of interest. But, not all problems (or data) are created equal. For any given scenario, there several possible predictive models you can fit, and no one type of model works best for all problems. In some cases, a regression model might be the top performer, in others it might be a tree-based model or a neural network.  In the search for the best performing model, you might fit all of the available models, one at a time, using cross-validation. Then, you might save the individual models to the data table, or to the Formula Depot, and then use Model Comparison to compare the performance of the models on the validation set to select the best one. Now, with the new Model Screening platform in JMP Pro 16, this workflow has been streamlined. In this talk, you learn how to use Model Screening to simultaneously fit, validate, compare, explore, select and then deploy the best performing predictive model.     Auto-generated transcript...   Speaker Transcript Mia Stephens it's probably won't take 45 minutes. All right, we're good. damn it, why did that text just come on. Jason Wiggins, JMP Did you see, I was just gonna ask that I had started recording but that somebody's going to get in and and clip the beginnings in the ends of the so. Mia Stephens I would say you're recording. Jason Wiggins, JMP yeah. I can stop if you want. um. Jason Wiggins, JMP Alright recording started. Mia Stephens hi my name is MIA Stevens, and I am a JMP product manager. And today we're going to talk about a new feature that first appeared in JMPed 16 or JMP pro 16 model screen screening. Actually, let me start that over. Jason Wiggins, JMP Recording started again. Mia Stephens Hi, I'm Mia Stephens. And I am a JMP product manager. And today we're going to talk about model screening. Model screening is a new platform in JMP Pro 16, and it allows you to fit several models at the same time, which enables you to streamline your predictive modeling workflow. So we're going to start by talking a little bit about what we mean by predictive modeling, and this is in contrast to more classical modeling or explanatory modeling. We'll see the types of models that you can build in JMP Pro, and we'll talk a little bit about the predictive modeling workflow. And then we'll see how to use model screening to streamline this workflow. We'll see several different metrics for comparing competing models and we'll use two different examples in JMP Pro, and I'll be using JMP Pro 16.1. So first, what do we mean by predictive modeling? Most of us have had statistics courses at the university or college or through your company, and typically, when we...when we learn statistical modeling it's explanatory modeling. And what we mean by this is that you're interested in studying some Y variable, some response of interest, as a function of several Xs or input variables. So we might be interested in identifying important variables, maybe we're in a problem-solving sort of situation where we want to try to identify the most important Xs, the most important potential causes. So, as a result of explanatory modeling, we might be able to identify, say, that X1, X3 and X6 are potential causes of variation in Y, and this might lead you to study these variables further using a designed experiment. We might also be interested in quantifying the nature of the relationship so, for example, we might be interested in understanding how Y, our response, changes on average, as a function of the Xs. a one unit change in X is associated with a five unit change in Y. So this is what we typically think about when we talk about modeling or statistical modeling or regression analysis. Now, in contrast, predictive modeling is about accurately predicting or classifying future outcomes. So in classical modeling, we're typically making statements about the mean, the overall mean. In predictive modeling, we want to know what's going to happen next. So, for example, will this customer churn? Will this customer take the credit card offer? Will the machine break down? So we're interested in predicting what's going to happen next. So we're interested in accuracy, accuracy of our predictions. In predictive modeling, we tend to fit and compare many different possible models and this involves or can involve some very advanced techniques. And some of these techniques may not be very easy to interpret. So if you think of neural networks, neural networks are typically considered to be somewhat of a black box. And with these more complex models, overfitting can be a problem, and what I mean by overfitting, is we tend...we can fit a model that is far more complex than we need. So in predictive modeling, we tend to use model validation. We use model validation to protect against overfitting and also against underfitting. And what model validation involves is splitting our data, and a typical way of splitting our data is to have some of our data held out... well, some of our data is used to train our model, some of our data held out, and we might have a third set that's also held out. And the way we use these data is that we fit our models using the training data and then we see how well the model performed on the validation data. And the third step might be used, and this is often called the test set, because validation...the validation data in JMP is often used to control how big the model gets. So we'll see this in action in a few minutes. I won't go into very many details around validation but if this is a topic that's relatively new to you, there are some slides at the end of this deck that I borrowed from our free online course, STIPS, and it illustrates why validation is important. In JMP Pro there are lots of different types of predictive models that you can fit. Some of these are available from fit model, so if you have a continuous response, you can fit a linear regression model. If you have a categorical response, you can fit a logistic regression model. This can be an ordinal or multinomial or binomial regression model. You can fit generalized linear models or penalized regression models like elastic net and Lasso and ridge regression. There's a predictive modeling menu with many items, so for example, neural networks. And you can fit neural networks in JMP Pro with two layers, with multiple nodes, multiple activation functions, with with boosting. And generally in JMP, you can fit models for categorical responses or continuous responses using the same platform. You can fit a variety of different types of trees, classification trees if you have a categorical response, regression trees if you have a continuous response. You can also fit some more complicated types of trees, like bootstrap forest and boosted trees. And you can see that there are other methods available as well. And there are a couple of predictive modeling platforms under multivariate methods, so discriminate analysis and partial least squares. So this is by no means an exhaustive list, and I will point out that partial least squares is also available from fit model. The question is, why do we need so many models? In classical explanatory modeling, we're typically learning about regression analysis, but there are clearly many, many more types of models that we can fit. Well, the reason is that no single type of model is always the best. The best model often depends on the structure of your data. How many variables you have, whether you've got purely continuous Xs or if you've got a mixture of continuous and categorical variables, if you've got non linear behavior. So, so no single type of model is always the best model at predicting accurately. So we generally try several different models and we pick the model or the combination models that does the best job. And we think about predictive modeling in the context of the broader analytic workflow. Of course there's some type of problem that we're trying to solve, a business problem or an engineering problem. But we compile data, and this could be data from variety different sources, and we need to pair these data, we need to curate the data. We explore and we visualize the data to learn more about what we have available to us and additional data needs. And the predictive modeling aspect of this is under analyze data and build models and, in my mind, this is this is where the fun really begins. So the classical workflow within JMP is that you fit a model with validation. And then you save the prediction formula to the data table, or you publish it to the formula depot and we'll see this in a few moments. And then you fit another model. So you might start with a regression model and then fit a tree-based model. And then you repeat this and you fit several different types of models. And then use model comparison to evaluate the performance of each model on the validation data. So remember we use the training data to fit the model, but to compare models to see which one prediction most accurately, we use the validation or the test data. And we choose the best model or the best combination of models. And then we deploy model, so this is kind of a typical workflow. Now, with model screening, what model screening does is it streamlines this. So you can fit all of the desired models within the model screening platform, compare those models using validation data, select the best model. You can explore the models, you can even fit new models from model screening, and then you can easily deploy the best model. So, to take a closer look at this, we're going to use a couple of examples, and the first example is one that you might be familiar with, the diabetes data, and these are in the sample data directory. So the scenario is that researchers want to be able to predict the rate of disease progression one year after baseline measurements were taken. So there are 10 baseline variables and 442 patients, and there are two types of responses that we're going to look at here. The response Y is a quantitative measure that measures how much the disease has progressed in one year. And there's also a binary response, high/low, so high means there is a high...high rate of disease progression and low means that there was a low rate of progression. And the modeling goal here is to predict patients most likely to have a high rate of disease progression so that corrective actions can be taken. So, ultimately, what we'd like to be able to do is build a model that allows us to classify future patients, based on their demographic variables and the baseline information that we have here. So we want to be able to accurately predict whether they're going to have a high rate of disease progression or not so that we can then take action. Okay, so that's that's the scenario. And before we get into JMP, I want to talk a little bit about how we decide which model predicts the best. And I mentioned that we use the validation or test set for this, and there's several different types of measures that we can use. For continuous responses, we use RMSE, which is the root main square error, or RASE, which is a related measure, the root average squared error, and this is the prediction error, and here the lower is better. We can also use AAE (average absolute error), MAD (mean absolute deviation) or MAE and for these also, lower is better. So these are...these are...these are kind of average error measures. And then, R square, and for R square, which is really measure of goodness of fit, higher is better. So we tend to use these for continuous responses; there are many other types of measures available. For categorical responses were often interested in the misclassification or error rate or accuracy rates, we might be interested in precision. There's other measures, area under the curve, which we see in an ROC curve, sensitivity and specificity, and these go by several different types of names. And there are a couple of other measures that we've added in model screening that are good when we have unbalanced data. There's an F1 score and MCC, which is Matthews correlation coefficient, and this is a value that takes on... or a metric that takes some values between minus one and plus one, and it's good at measuring when you've got categorical data, how well the data are classified in all four of the possible categories, the true positive, true negative, false positives and false negatives. So let's...let's go on into to JMP. And I'll just open up the data. So these are...and I'll make a little bit bigger, these are again the diabetes data. And if you want to play around with these data, there's a blog that uses these data, and you can also find them under help and then sample data. And again, we have a response Y, let me make it a little bit bigger, which is our continuous measure of disease progression. And then we have a binary Y with outcomes of low or high, and we'll take it...we'll take a look at each of these. So to start, let's take a look at if you...if you had JMP 15 or if you're not using model screening, how would you fit predictive models to predict Y, the continuous response, if you have JMP Pro? Notice that we we have several demographic variables. So there are 10 of these, and there's a validation column, and the validation column partitions our data into training and validation data. So we're going to fit our model using the training data, and then we're going to see how well the model performs when we apply it to the validation data. To create this column, there's utility under analyze, predictive modeling, make validation column, and this allows us to create a validation column. You might have variables that you want to use as stratification columns so, for example, you want to make sure that the data is balanced across the validation sets. I'm just going to click OK here. And this allows us to assign a certain percentage of our data into the different sets. And there's some additional options down here that are useful, so random seed is really nice if you're teaching and you want all of your students to have the same partitioning. So this was used prior to analysis, so let's start by just taking a look at this column Y, the continuous response. So if I were fitting predictive models, a natural starting point is to fit a regression model, and regression in JMP is under fit model. My response is Y, and I'll just fit...start by fitting a main effects only model. I'll input validation here in this validation field and, by default, the personality is standard least squares. But if you're familiar with JMP and JMP Pro, you know that there are a lot of different options available under personality. When I click run, one change in JMP 16 is that if we look at the actual by predicted plot or any of the other plots, notice that there are a points...there are some points here with a V, instead of a dot. This is to indicate that these points were actually assigned to the validation set, so we're basically looking at, you know, how well the model predicts in the random noise around that, and you can see that there's some some degree of scatter. And remember, we're fitting our model to the training data and we're seeing how well it performs on the validation data. So there's some statistics down below. So anytime we've got validation, you'll see some extra statistics for the training and validation sets. And we can see that there are a couple of measures here. There's R Square and there's also RASE. And generally when you fit the model to one set of data and then you you you apply that model to new data, it's not going to perform quite as well on the new data. So we can see that R square is lower for the validation set and the RASE, the error, is a little bit higher. Once...the one thing we see down here under the prediction profiler is that notice...notice that total cholesterol and LDL really kind of fan out at the ends. And as I drag the value of cholesterol, notice how wide this this confidence interval for the mean gets. So as I drag it to the low end, it gets super wide and the same thing as I drag it to the high end. In fact, all these intervals for each of the predictors gets super wide. And this is kind of beyond the scope of this talk, but this is because we don't actually have data or a lot of data out in this region. If I took...take a look and I'm just going to grab these two variables, total cholesterol and LDL as is an example...for the cholesterol and LDL. Notice that there's a pretty strong correlation between these. I don't have any data in this region and I don't have any data in this region. So if you ever fit a predictive model or any type of model where you see these bands really fanning out, it's an indication that you your predictions aren't going to be very precise out in that region. And there's a new feature in JMP Pro, called extrapolation control, that can warn you if this is a problem. So if I drag this out, notice that it's telling me hey, there is possible extrapolation; your predictions might be problematic here. And if I turn extrapolation control on, notice that it truncates these bands, and it's basically saying, you can't make predictions out beyond those bounds. It's just not valid. So this is a little bit beyond the scope of this, but if you ever see bands in your prediction profiler that are fanning out, it's often because you're predictors are correlated to one another, and it's...it could be that you just don't have a lot of data to make predictions out in those regions. So it's good to be aware of that. Alright, so getting back to the task at hand, so let's see, I fit this model and reduced the model and comfortable that this is, this is my final model. I'm going to save this model out to my data table by saving the prediction formula. And you'll see that it adds a new column to the data table. And I just built a linear model, so it saved that formula to the data table. And an alternative is to publish this, and I'm going to publish it to the formula depot. So, ultimately, what I'm going to do is I'm going to fit several different types of models (and I'll go ahead and close this) and I want to compare those models so I can use that... I can do that either by saving the formula and going to analyze, predictive modeling, model comparison, and I'll do this in a moment, or by saving it out to the formula depot. And the advantage of saving it to the formula depot is that now I've got scoring code. So, for example, if this were my final model that I wanted to apply to new data, I could copy this script and I can apply it to score new data (and I won't do that here), or I can create code in a variety of different languages, if I want to be able to use this model in a production environment. So that was a simple linear model. What if I want to fit a different type of model? I could fit a more complex model here in fit model. So I can fit a generalized regression model and maybe I want to fit a Lasso model, so I'll click go here. And there's some model comparison right here within generalized regression, but let's say that I want to save this model out, right. And I'm not I'm not paying too much attention to the details here, but I'm going to save this one. I'm going to save the prediction formula out. Maybe I'll go ahead and publish that one as well. Go here, publish this one. And I might repeat this, I might fit a model that's got interactions. I might then proceed to fit a neural network model or partition model, a tree-based model and, if I have a categorical response, this will fit a classification tree. If I have a continuous response, that will fit a regression tree. I might fit a bootstrap forest, which is a collection of smaller trees or boosted tree, so there's several different types of models I can fit. And once I get my models and save the formulas out or published to the formula depot, I want to find which model predicts most accurately. So I'm going to use model comparison. And I'm simply going to add validation as my by variable, because I want to be able to see how well the model performed on the validation data. And just using these two models and looking at the validation data (I'll tuck away the training data), I can see that I have lower average error for the least squares model and my RASE is a little bit higher for least squares model. And my R square is a little bit better for the generalized Lasso. So this is the typical way we might do predictive modeling if we have JMP 15 Pro or JMP Pro 15, or if we're not using model screening. And what model screening does is allows us to do all of that within one step. So I'm going to go into predictive modeling and launch model screening. And model screening, if you look down in the corner here, allows us to basically launch each of the platforms that makes sense for the type of data we have and automatically run those models. So we don't have to go in and select the model from the menu, we can run it automatically from here. So I'm going to fit the same type of model, so it's a main effects only model here. These are my Xs. I'm going to use the same validation. And there are a bunch of options here that we'll talk about in a moment, and I'm going to click OK. And notice I've got a decision tree, bootstrap forest, boosted tree, K nearest neighbors. By default Naive Bayes is off, I can fit a neural model. There's others at the bottom here. Notice that got XGBoost, so XGBoost is available because I have installed an add-in, the XGBoost add-in. So this is a free add-in that accesses open source libraries, and if you've got it installed in JMP Pro, then you can launch it from the model screening platform. So I'm going to leave that off and just click OK. So it's running the models and there it is. So it's gone out and it's actually skipped a step, right. It's run each of these models for me automatically. And, and there are some simulations you use to determine the best starting values for the tuning parameters, so we fit all of these models. And it's reporting out validation training and validation statistics for each of the models. And now to figure out which of these models is the best, Select dominant. And what select dominant basically does is a create afrontier across the available metrics, so I hit select dominant here. And finds that for these data, the best model is the neural boosted model, and then the details for that model are up here under neural. So all of the details it's really just launching the platform and capturing all the information, right here within this this platform. So all the details are here for neural, so it's basically run a TanH, with three nodes that is boosted. And I can see the training statistics and the validation statistics and if I want to be able to see the model that was actually fit, here's the underlying model. Right, so this is...so this is my best performing model. If I want to be able to take a closer look at these models, I can explore them from here and there are some options here, I can look at the actual by predicted plot. And I can compare these models, so as expected, there's a little less variation in our...in our training data than there is in the validation data and the actual by predicted plot. I can turn off certain models if I want to be able to explore this. And I can also launch the profiler and if you've got 16.1, the profiler is an option. It's actually a hidden option if you have 16.0, and again I'm in Pro for this. The profiler is nice because it allows us to explore the models that we fit, so I've selected three models. And let me make this a little bit bigger. Maybe not that big. So I've got a bootstrap forest, which is a tree-based model, where it where it basically adding the results of a bunch of different trees together. I've got a boosted model, and neural models are really nicely picking up nonlinear behaviors. And I've got the Lasso model, which is essentially just a linear model. And you can see these models pretty clearly. Notice with generalized Lasso, right, I've got this linear relationship between BMI and the response. With neuro boosted, I can see that there's some curvature. I can see it out here with with LTG, and with bootstrap forest, I can see that I've got this little step function, right, so it's actually approximating this this non linear relationship that we have here. So if we decide that the neural boosted model is the best, then we can basically launch the model, launch it from here, and we can either save the formulas out to the data table, or we can publish them to the formula depot for deployment. So it saves a lot of time and still gives us the ability to explore our models. And what if we have a categorical response? So remember, we have Y binary. We can use Y binary in the same way, so I'm going to go to analyze, and again call model screening, and I'll just hit recall and instead of Y I'm going to use Y binary. We can run the same types of models in it. If a certain type of model doesn't make sense, it won't run it. So, for example, least squares regression doesn't make sense here, so it won't actually run it. There are a few options here. I if I select set random seed, like 1, 2, 3, 4, 5, this will give me repeatability and if I launch a model from the details, it'll...it'll have the same results. I've got some additional options here for validation. So it turns out that in many cases, using the classical partitioning of training, validation, and test isn't actually the best, particularly in small data sets. So I might want to use K fold cross validation instead and we've also got nested cross validation. And you can do repeated K fold validation. So this is a nice option, particularly if you have a small data set. And there's some additional options at the bottom. So you might want to add two-way interactions or quadratics. If you know you're missing values, then you might want to use informative missing. And additional methods applies to additional generalized regression methods. And if you're in the JMP 17 early adopter program, you'll notice that this part of the menu dialogue or the the model launch dialogue has been reorganized a little bit to make it flow a little bit better. Right, so so I'm going to just use the default settings here with a random seed and click OK. So it's running all these models. Again, it's very, very quick. Well and now, because I have a binary response, I've got some additional metrics. I've got two R square measures, I've got the misclassification rate, that area under the curve, RASE. Remember this is related to root mean square error. Again, I'll click select dominant and again it found neural boosted, and that's not always the case. If I select what I think are the top models, the the options under the red triangle, you can look at the ROC curves for these and, again, there are check marks that I can turn on and off models. Let me select this. And one of my favorite new features, which is also available from the model comparison platform and in JMP 17 is being added to any any modeling platform that does classification modeling, is this thing called a decision threshold. I think this is one of the...one of...one of the most powerful things that we added in JMP 16. So with these three models selected, it's given me a summary of the classifications for those three models. And I'm going to make this a little bit bigger, so we can see a little bit better. So the way we read this is that, if we just take a look at neural boosted, it's showing us that two categories for neural...neural boosted and we're predicting the probability of high. So these are fitted probabilities for high. And the cutoff is .5, so anything below .5 is classified as low; anything above .5 is classified as high. So, if we look at this red block right here, these are all the points that were classified as low, but were actually high, and we can see that number here. There were 29 of those, so these were the false negatives. The lower red box, these were classified as high, but they were actually lows. So these are the false positives, these guys here. And across the bottom are some metrics, some graphs to help us make sense of this, and makes...and helps us to make some decisions. And there's also some metrics, so we can see those same values here, the 29 false negatives, 17 false positives. And we can also see the true positives and the negatives. And what I really like about this is that it allows me to explore different cut offs. So, depending on your modeling goals, for example, we might be interested in having a high sensitivity rate, but we really don't want to have a low specificity rate. Sensitivity rate, and this key over here is really helpful, sensitivity is the true positive rate; specificity is the true negative rate. You know, can we find a cut off that actually does a little bit better at both of those? And by clicking and dragging, you can change the cutoff for classification and try to find a cut off that does a better job. So for example at .26, and I'll just put in a nice rounded number here, at a cut off of .25, the sensitivity is higher, while still maintaining a high specificity. The graphs that I tucked away are really useful in in interpreting these numbers as well, so they give us an overall picture of false classification by threshold, by portion, and then true classification. So if I'm trying to find a point that balances specificity and sensitivity, I can look at this graph. So these top lines are the sensitivity, the dashed lines are the specificity and is there a point where I'm still above a certain value for sensitivity where... sensitivity, where my specificity stays high? And we can...we can balance those out. So if I look right here, I can see this is the point at which both of those lines cross. And this becomes really important if we have a case where we have unbalanced data. So let's take a look at another example. So I'm going to close this out and not save this. And let me close the formula depot, and by the way, formula depot is standalone, so basically it's not tied to the data set that we were using. So it makes it really easy to take a model and then apply it to new data. I'm going to just go ahead and close this. So the scenario here for this next data set, the data called credit card marketing, and the scenario is that that we're doing some market research on the acceptance of credit card offers. And the response is offer accepted. And a caveat is that only 5.5% of the offers, using historical data, have been accepted. This is actually a designed experiment where we're looking at different rewards, different mailer types, different financial information, but the designed experiment is is in reward and mailer type. We're going to focus on this particular modeling goal. We want to identify customers that are most likely to accept the credit card offer. And by doing this we're going to try to apply the model to new customers and, based on the classification for these new customers, we're going to make a decision whether to send them the offer or not. So if customers are not very likely to accept the offer, we're not going to send it to them. If they're highly likely, then we are going to send it to them. So that's the scenario, and returning to JMP, we've got offer accepted. And I'm just going to go ahead and run a saved script here while I'm talking about the data. We've got 10,000 observations, so this is going to take a little bit longer, and right now it's running those models, and I'm just using the default settings here. So I've got offer accepted, the reward type is air miles, cashback, or points. The mailer type is letter or postcard. And then the rest of these are banking...historical banking information and some demographic information. You see that it's running support vector machines. Support vector machines can be a little slow, particularly in big data sets, and in fact, it won't run, by default, if we have over 10,000 observations. We can see we've got average balance. And the data are petitioned into training and validation, and JMP will automatically recognize the zeros as training and the ones as validation. Right, so it's run all of the models. And if I select dominant, we can see that it selected fit stepwise and decision tree. But I want to point out this misclassification rate. So this is...this is really something we see very often when we're trying to look for the needle in the haystack, right. If I take a look at these data, and I'm just going to run a distribution of these data... Let me make this a little bit bigger, so we can see it a little bit better, right. So I'm just looking at the validation data, notice that 6.25, 6.2, 6.3% of the values were yes, they responded to the offer, and that's exactly what we're getting here for the misclassification rate. And to better understand this, I'm going to turn on the decision threshold. And make this a little bit bigger. Notice that the predicted yes is all zero, and this is because the predicted probabilities were all relatively low. Remember that that in a classical confusion matrix, the cutoff is .5, and if you have unbalanced data, it's very unlikely that you're going to classify very many of the observations as yes, in this case. As I click and drag, and I get to this first set of points, then I start to see some predicted yeses. In fact, I'll turn the metrics on. When you've got highly unbalanced data, the predicted probabilities might be very low. So, having this slider allows you to explore cut offs that can impact both your sensitivity and specificity. So in a case like this, we might say something like, well, if we were to send... I'll just put in .05 here...if we were to send a credit card offer to the predicted probabilities, the top 5% of the predicted probabilities of yes, right, our sensitivity between the two models is about .72, .75 and our specificity is still pretty high. So having this...this cut off, this slider, allows you to get more useful information out of your model. So, so if...at .5, I can see I'm getting some true positives. I'm also getting a lot of false positives, right, so there's a trade off here. Now, in a case like this, you may also have some financial impacts. right. So it makes sense to think about this... profit matrix. And there's a button. I click on this. The profit matrix is a column property for a particular column, and we can set that value, that cut off from the probability threshold, as a column property, or we can also plug in values. You know, in classical scenario is maybe it costs $5 if we choose to send somebody an offer. And if they respond to the offer, maybe it's worth $100. So if we choose to send somebody an offer, and they don't respond, we lose $5. If we choose to send somebody an offer, and they do respond, maybe we make on average $100. And maybe there's some lost opportunity, maybe we say that we lose $100 if they don't respond. And oftentimes the cost of this is, this is just left blank, we just we just plug in values for the yeses. If I apply this and click OK, then some new values are added, right. So, so now we have a profit matrix and if I look at our metrics, then we can see the predicted profits. So on average, right, for each customer in this data set, it's saying that will make $1.50 if we apply the decision tree model and a little bit less if we apply the stepwise model. So, in a situation like this with unbalanced data when there are some underlying costs, it makes sense to use the profit matrix. If we don't have underlying costs, then this slider helps us to pick a cut off that will allow us to meet our modeling goal, okay. It could be to maximize sensitivity, maximize specificity, or something else. So that's that's essentially model screening. Let me close this and just go back to the slide deck. So we've talked a little bit about predictive modeling and what we mean by predictive modeling. And we've seen that you can easily fit and compare many models using validation, and model screening is all about streamlining this workflow. You can still go out and fit individual models, if you want, but model screening allows you to fit all the models at the same time in one platform and then compare these models. For more information, there are lots of different types of metrics available and many, many metrics for classification data. There's a really nice page in Wikipedia about sensitivity and specificity and all of the...all of the various metrics, and I found this really useful as I was trying to dig in and understand how to interpret the metrics that are provided. There are a lot of resources about predictive modeling in general, and also model screening in JMP Pro in our...in our user Community. Again I'll reference STIPS. STIPS is our statistical thinking for industrial problem solving course. And the seventh module is an introduction of predictive modeling, so if you're brand new to predictive modeling, I'd recommend checking this out. And there's also a blog that uses the diabetes data that's posted on our Community. And finally, you know, any new feature will tend to go through iterative improvements. Model screening was introduced in JMP Pro 16. We're already working on JMP Pro 17, and there are a lot of improvements already in the works. So if you're interested in taking a look at what's coming, it gives you an opportunity to provide feedback, you know, reach out to us about joining the early adopter program and we'll get you set up. Thank you for your time and hope you enjoy the rest of the conference.  
Jason Wiggins, Sr. Systems Engineer, JMP   From our homes to the lab to the production floor, measurements systems are everywhere. At home, there is the dreaded bathroom scale. In test labs, a variety of systems are used to measure important physical properties like material strength. Production systems rely on measurements of quality characteristics such as the shape of a machined part. The measurement systems we interact with have a variety of behaviors. They may be noisy or precise, accurate or biased. They may or may not alter or destroy our samples. Regardless of behavior, we need to know enough about the noise in our measurement systems to make informed choices about how they are used to avoid costly errors. In the early 90’s the Hubble telescope’s main mirror was overground due to an errant measurement. The mistake, discovered post launch, nearly sidelined the project. Understanding and addressing the source of variation through a MSA pre-launch could have saved NASA more than $600 million for trip to space to repair it. This presentation will cover the ins and outs of applying MSA results from the Evaluating the Measurement Process (EMP) method pioneered by Dr. Donald Wheeler. An overview of EMP along with examples from home and industry will be explored. EMP gauge classification, reporting precision and guard banding specification limits to ensure product conformance will be demonstrated in JMP.     Auto-generated transcript...   Speaker Transcript Olivia Lippincott Again, the recorder is on ago, often on mute and I'll come back on. Jason Wiggins, JMP Right. Hello, everyone. It's my pleasure   that darn bathroom scale.   So for me it's hard to imagine life without tools that measure stuff, and measuring stuff is a very human characteristic. We just do that.   I find a lot of happiness in measurement systems that are well behaved and most certainly, I get frustrated when when they're not.   There is a cost to developing, refining, and even correctly using a measurement system, whether it's simple or complicated, there is a cost associated with that.   In my experience, this expense is small when compared to the consequential costs of using data from a faulty gauge or even misusing data from a good one.   So a great example of this is the Hubble space telescope.   An aberration in the primary mirror was discovered not long after launch in 1990.   The aberration impacted the clarity of the telescope's images; in fact, this is a before and after.   The source of the problem was traced back to a miscalibrated piece of equipment that was used to grind the mirror.   over $620 million. That's 1993 dollars; that's quite a lot of money.   And I believe this cost could have been avoided completely had a measurement systems analysis been done prior to grinding the mirror.   Now while the Hubble telescope...Hubble space telescope repair is a is a great example of a high cost measurement mishap, issues with measurement systems, even with lower stakes, I feel, can be equally as as frustrating. So take for example, my dreaded bathroom scale.   Last year...December of last year I started on a on a weight loss journey.   And just give you a sense for what I was seeing over time and then from day to day.   I would routinely get four pound jumps in weight. Now I think if if if I'm out here, near my target weight...hovering around my target weight, maybe you know, plus or minus four pounds' fluctuation   isn't an unrealistic thing to expect. Now emotionally something different happens early on in this process and   we'll just kind of narrow in and talk through this a little bit. So here's me, I'm working very diligently to dump weight, doing all the right things.   And then I wake up one morning and I'm four pounds heavier. So that's frustrating, right? And hopefully that's something that others can relate to as well, but my question   throughout this (I probably should have done a measurement systems analysis before I started this journey), but my question when I would see stuff like that   is how much of that four pounds is me and how much of it is noise in the measurement system? So I ran a measurement systems analysis to answer the question. Now there's a spoiler alert here.   The first study that I did wasn't very good. The gauge is abysmal. I would have thrown it away, but I like to recycle my electronics.   The end of the day, this was not a good usable gauge.   Part of my goal today is, I mean, this is a fun exploration of a real-life measurement system thing that, you know, maybe we can all relate to, but one of my other objectives is to highlight some of the benefits   of using the EMP method for measurement systems analysis. Now to do this, I'll cover some basic definitions and then demonstrate the method using   results from an MSA from a more well-behaved gauge. Now rest assured, we will get back to my abysmal horrible bathroom scale example here in just a minute.   Alright, so first off, what is measurement systems analysis? For me, a simple explanation of this is that, you know, we're going to run an experiment, and what we want to get out of the experiment is to identify or measure measurement system limitations.   What are our goals? Well, if we do a measurement systems analysis, we will certainly can determine whether the measurement system we're using is good enough   to use or whether it needs improvement. But I feel more important to that and where the EMP method shines   is that we want to set standards for using the measurement data. There's no such thing as a perfect measurement system, so what are some standards that we can put in place that help us use our measurement data effectively?   How's it done? It's a variance components analysis, so we are looking for the variance components associated with the measurement system,   and, hopefully, those are small compared to the measurement variation associated from...with our parts. So we have some total variation of all of our measurements and then there are components, these two different components of variation that we need to understand.   So what is EMP? It stands for evaluating the measurement process. It's pioneered by Dr. Donald Wheeler. There's an awesome book that I'd highly recommend getting.   For me, I think, probably the simplest way to think about EMP is that it's providing a statistical approach for using measurement data effectively. So it's not enough to know whether my gauge is good or bad,   so first class, second class, third class, those are based on probability of detecting a shift in in the process.   The old method, the one that I learned first, the GRR, percent R&R, the AIAG method, uses arbitrary thresholds for gauge classification. So here's a problem with that that...that I encountered and it was what actually brought me to EMP to begin with.   Many years ago, I was working with a destructive test, so this is a measurement system.   And we did a measurement systems analysis, and it turned out to be a fourth-class gauge by the percent R&R standards. Well...well,   you know, that's something, but what is that telling us? And the hard thing about what made that difficult to stomach was the fact that we were using this destructive test   to make decisions in product development that were generating revenue. It was a useful gauge, so the percent R&R told us nothing about the utility of our gauge and it didn't give us anything in terms of   recommendations for how we might use that gauge. So I kind of feel that the percent R&R approach for me really lacks information that I need to use the measurement data effectively.   All right, a less frustrating, more well-behaved gauge example. This is the coordinate measure machine.   Measures geometry of objects. This is a very simple experiment where you're just measuring one dimension of length.   So length of parts are noise variables, like you'll notice this little joystick here. We have operator involvement in the measurement system. I've seen automated versions of this and maybe you wouldn't include operator, but in this case you do.   Have variation associated with parts, and then this operator/part interaction. Now this is...this interaction term, like for those of you who love DOE,   we all know that interactions happen everywhere, but I'll do my best to demonstrate an example of a very common operator/part interaction. So what I'm holding is a little   metal cylinder and I have a dial caliper.   I'm going to   get a measurement on the diameter of this part and I see it's about 19 millimeters.   If I knew what the size of this was or I had an expectation of what that that diameter should be, what I might do is I might actually apply more or less pressure to the dial on my my gauge   so that I hit my expected measurement. Now this happens everywhere. I catch myself doing it. If these types of things happen every time we as humans interact with measurement systems, so   if you have the money and the budget to build an experiment to look for this, I most certainly recommend that.   Okay, JMP 16. This is kind of a fun thing. I love this new tool. There is a...   I'm sorry...an MSA specific experiment design tool in the DOE menu for JMP 16.   I'll show you, kind of walk through a demonstration of this, but before I do, I kind of want to have a little discussion about randomization.   For all of us that do DOE and measurement systems analysis, we know we randomize because we want to reduce the risk of lurking variables impacting our results. So that's kind of a given, right?   If we don't completely randomize, we're going to potentially run into some issues.   What I would also recommend is to make the operator blind to the random run orders. So imagine me and my caliper, I'm measuring 10 different parts and I don't ever know which part it is that I'm measuring, so there's less of an opportunity   for me to bias the experiment, the measurement systems analysis experiment, just by knowing that information. So randomized, make it blind if you possibly can.   If you run into randomization issues, you could possibly look at a split plot design or some other means of dealing with the randomization problem, but we definitely want to do that as as to the extent we possibly can.   Okay, let's take a look at it. DOEs, Special Purpose, there's our MSA design.   Just like all the other DOE tools in JMP, I can, you know, type in the name of my factors and maybe I need to add a another factor for operator here.   One thing I really like here is that I can give this an MSA role, and this gets copied as a column property in the data table that we're going to generate.   So let's say operator has a different name, but I want to have at least some anchor to the fact that I'm using that as an operator role in the MSA, and I can do that. That's pretty awesome.   In this experiment, we have four different parts, four different operators, and we're going to run two replicates. So what this means   is that I have my original copy, my first run through the experiment, and I'm going to repeat that two more times. So I get three measurements for each run in my original   experiment.   We're going to completely randomize and let's take a look at the design. So much the same as DOE, get a quick preview of of our experiment design, can look at that,   generally make sure if it makes sense to us, if that's what we wanted to do and what we're seeing is correct. What I really like about this tool is the design diagnostics, so a little bit different than what we see in the custom designer, for instance,   or in the compare designs tool. There's a simulation that's running behind this and we can play with...you know, maybe we adjust the variance components associated with my test and there's a simulation that runs in the background. And we can   ask some questions. Is our experiment design set up well enough that we can tag a first-class gauge if if that's what we have? So really cool thing.   I love it because it is very measurement systems analysis focused, and just like DOE, hey, if we need to add additional parts or additional operators, add additional replicates, you know, we can play with that within the platform without dumping to a table right away.   All right, let's take a look at our standard MSA.   I want to walk through some some basic graphical outputs. We'll do it kind of slow because we'll be repeating this process a few different times.   For this measurement system, the graphical outputs that we get from the EMP report are actually really pretty cool. So for instance, this average chart,   if I were to show the data here rather than the averages, that would be just like same result we would get from the variability attribute gauge chart tool.   So what it's telling us if my measurements...most of my measurements are outside of my control elements, then we're very likely going to be able to detect differences between parts.   Standard deviation, and if I would have chosen range for dispersion, we'll talk about this a little bit more in in just a minute.   This visualizes repeatability so we want everything to be inside limits here, and that's the case with with this gauge. So a few more.   Here's one of my favorites. Again, the operator/part interaction. This graph, if I have lines crossing, indicate that an interaction is is possible here. So   they don't...these lines don't have to stack on top of each other and they can be gap apart, but we're really hoping we don't get any crossing of lines in that report. Analysis of means, I like this, you can use it as a procedure tool.   It's just telling us that operator one tends to measure low on average and operators two and three tend to measure high.   And that may not necessarily be bad, but if it is, we might be able to drill into the source. You know, why is it that operator one measures low on average? So again,   test, retest, error comparison. We're just showing whether there's an inconsistency in how each operator is is measuring.   So those are the graphical. Let's talk a little bit about some additional EMP terms. So kind of the foundation of the gauge classification in EMP is the intraclass correlations. That's the proportion of variation from the part.   And we hope that within within is associated with our repeat measurements part and   and operator and part of the biasing terms. We're really hoping that these are small compared to the total, so higher is better on this this statistic; one is perfect.   And if you're going to go through Wheeler's book and do hand calculations on things like intraclass correlation from variance components, definitely use the standard deviation chart as the dispersion type.   Wheeler has a really good discussion in his book about when you might want to use the range and we're going to see that, with the bathroom scale measurement system. So if your...if your precision is fairly refined, you probably   could use the standard deviation dispersion type. If it's chunky, range is probably going to be the better better approach for that.   Probable error. We're going to see this in a few different places. There's a lot of utility around this and I took this from the book.   Wheeler describes this as "No measurement should ever be interpreted as being more precise than plus or minus one probable error since measurements will err by this amount or more at least half of the time." So this statistic is going to be used in a couple of different places.   Notably, we're going to use both a measurement increment,   which is a function of the probable error, to adjust our precision expectations for the measurement, how many decimal places do we want to report.   And we'll see another place where probable error is used in guard banding...statistical guard banning of spec limits.   Okay, so we'll just go down through the report and then we'll actually walk through the steps in in JMP.   We get a realistic gauge classification, so this is saying second class. Wheeler's recommendations are you may decide to chart the measurement process, so I may want to run a control before I do this measurement.   It's arguable. You know, it's again a budget conversation, you know. Can we afford the time to do that? Yes...if yes, do it.   If no, what...what's the cost of improving the gauge? Can we move it to a first-class gauge? How much do we think that might cost in a project to do that?   this monitor classification legend. Spend a little bit of time with it; you can extract some some guidelines for for control charting...or charting   the actual process. Effective resolution for this gauge, we have to drop digits. I'll walk through how I go about doing that.   Variance components, this is great. Part contains most of the variance, so that's that's a good thing, although we do still have some operator   and some operator/part interaction. We saw that in the graphical analysis as well. Okay, let's...   now I built an add-in, and right now it's...I'm going to use it for demonstration purposes. It's a little buggy. If I can get some of those things resolved, I'll post it.   But this is just launching the the EMP platform and then it's going to give me some other things, other options associated with with   the EMP report. All right. So we'll just remember part and operator. I'll use standard deviation in this case.   And there we go. Click OK. So these are all the things that we talked about in the slides.   I don't think there's anything different there to see. Now this this comes right out of EMP, and what I'm doing is I'm asking, given the noise in my measurement system, how...what's the product conformance I can expect   with my specification limits and can I improve that by tightening my specification limits? So I'm going to use statistics to guard band   my spec limits. So for those who are in high-tech manufacturing, you've probably heard of guard banding spec limit. Sometimes it's done as just a percent.   But I love EMP because I can use the probable error to make a calculation   that takes my measurement system noise into account when I tighten those limits. So what this is saying is, hey, for 96% conformance,   I need to tighten my bands. I forgot to adjust the decimal places here, but that would be 56 to 74.   So I just tighten it by one millimeter. That's it, on either on either side, either specification. The black lines, these are the original specification limits, the ones that we set for our customer.   If we run the gauge at these specification limits, we can expect about 64% conformance, so there's a trade-off here, right?   If we want to tighten the spec limits, it's going to cost us. We're going to be rejecting more parts, but is it going to protect the customer a little bit better? Most certainly. In fact, if we go four probable errors in from the spec limit,   we can get 99.9% conformance, but we're going to be rejecting a lot more product and some of that product could actually be useful to the customer.   The trade-off here is voice of the business versus voice of the customer, and it's a it's a business decision and it's and it's worthy of some discussion before you decide   how much you want to tighten your specification limits. Okay, that is the well-behaved gauge and we saw...oh I forgot, I was going to show   how I go about the the the decimal place precision problem. So a coordinate measurement machine spits out a lot of decimal places.   EMP is telling me that I really should drop digits. The way I do it is I just create a formula column, using around function, and and then I adjust this so zero is where I am now, but I actually tried two and one.   and reran the EMP analysis. And each...as soon as EMP tells me that I don't have to drop a digit anymore, that's the precision that I'm going to report.   Okay, and the manufacturing instructions. Again, this is another place where probable error and measurement increment are used in in that calculation.   It's great. It's a statistical approach to something that, you know, is common in high-tech manufacturing; it's guard banding. We can do that using our measurement system noise to   make the calculation. All right, back to the dreaded bathroom scale. All right, so this is a little fun.   Let me walk you through the design.   What I think is kind of fun about this is that when we set about to do an experiment, whether it's a DOE or measurement systems analysis,   we have to bring a little creativity to bear. And I had to. I almost got stumped by this, to be honest. So the noise factors associated with me weighing myself or anybody in my family weighing myself, well there's us, you know. I stand on the scale.   Part...think about part is like my daily weight fluctuation, right?   I can get, you know, let's say I'm at plus or minus four. I ate too much for a week and I gained two pounds. That's the thing that I want to be able to detect.   Now getting that's kind of tough actually, because I have to kind of break out that daily weight fluctuation from the operator part, the me part of it.   The way I went about it...there are probably a lot of different ways to do this, but the way I went about it is, I have my my participants in the measurement systems analysis, my family. They would measure their body mass,   step off the scale, I hand them a dumbbell, they step back on the scale. I subtract those and then I get the weight, so that'll be my part measurement.   Now to get this on the scale of my target weight that makes it a little easier for me to understand I just normalize it.   And it just...it's just changing the scale to something that makes a little more sense for me. Okay let's...let's see how this looks. All right, we'll do the add-in again. There's something that it's worthy of   note. Okay, I think if I recall this, it's going to remember operator and part again. This is a chunky measurement. I'm going to use the range chart dispersion type and the design is crossed.   Okay, already, right out of the gate, we're seeing that we probably can't detect   differences in daily weight. Now there could be something in here that's associated with how I ran the experiment; it's certainly possible,   but it's not looking really good. So let's look at a few others, so range chart, that's looking pretty good. Ah...whoa, there's a problem. My within variation, so that's associated with my repeat measurements, well, that's bigger than my part...my variance associated with my part to part variation.   That's not a good thing.   So problem two. You can kind of go through some of these others. Test, retest looks pretty good. My bias comparison analysis of means results look good. Alright, here's another problem. Look at the size of that probable error. Remember, I'm really hoping that I can see at least   a one pound difference in weight and have that be meaningful. Well, probable error is three times greater than that and that's a problem.   I'm being told to drop a digit. If I iterate through this, I actually have to go into scientific notation, report two digits, to get it even close.   And even still the probable error is just huge, and we're going to see, kind of a little further down, how that can be problematic in terms of using the gauge.   Right third-class gauge. So if we had to use the gauge, we would definitely be charting the measurement process, so running controls or standards   before we do our daily weight. Operator/part interaction, there it is again. This is a funny one because operator three is my brother-in-law. He's kind of a smart guy, and and as much as I tried to make it not possible for him to game the test, I think he was gaiming the test.   So that could be associated with my study and not necessarily with the gauge; just my observation. So we'll play that 96% conformance game again, dial that down to zero. And oh no, I can't compute the limits. In fact, my lower limit is is actually   higher than my upper limit. And why did that happen? Well, in the calculation   (let's see, if I back up to where I showed that),   if that probable error is too big, then it's going to create a problem, so there is a little bit of a limitation to this this   tool that Wheeler uses in in his book. But for me, hey, what this is telling me is that this bathroom scale's a piece of junk and I just need to buy a new one. So I'm I'm kind of cheap,   but I decided, you know, 50 bucks is probably not an unreasonable price, so I bought a Fitbit scale and we'll see what...let's see what that looks like.   Oh.   Run the study again.   And range, we'll keep that the same. Alright, already we're seeing a little bit of improvement. It's not as good as my coordinate measurement machine, but, hey, some of my measurements are actually falling outside the control limits, so I I would count that as an improvement.   Also look at that. My within variance component is actually less than part. I mean, it's still kind of big,   but 50 bucks...maybe maybe if I spend 100 or $1,000, I could get that down a little bit, but I think I'm willing to live with that. All right, so probable error, hey, yay. We're at least under a pound.   And of course, we need to consider dropping a digit, and so I would go through that rounding exercise with this as well, and it turns out that reporting zero decimal precision is what we want to do. And let's take a look at this, 96% conformance.   Again, hey, I actually get something that's reasonable. So for me,   what I set as my expectation going into this whole exercise of losing weight was that I wanted to be at a target weight at 180.   But I'm willing to live with plus or minus five pound weight fluctuations. And as I've gone through this journey, I've noticed that things that I eat can change, water weight gain within a day so that's a pretty reasonable expectation. Now if I want to be sure,   that I'm staying true to my goal, maybe I need to bring those specifications limits in by a pound.   So what started out as kind of a frustrating exercise through a fairly long journey of losing weight and working with a horrible scale, I've arrived at something that I can live with and that I'm actually kind of happy about. So   that's all I have. Thank you. Hopefully, this has been an entertaining a walk through EMP method of measurement systems analysis. Olivia Lippincott yeah stop the recording. Jason Wiggins, JMP And I do in time oh good.
Bradford Foulkes, Director of Engineering, Optimal Analytics   After spending weeks or months pulling together data and building reliability models, often the feeling is "Now what?" Or maybe the question is, “How many will fail?”  In JMP Pro, there is a platform that can answer these questions, with some tweaks. The repairable systems simulation (RSS) platform allows you to enter a reliability model, or a system of models, to see how frequently the event will occur and what the impact could be. In this presentation, I explain how to go from reliability model to event prediction, via a method to automate the generation of the RSS platform. Once you have the output, I show how to build reports to answer the question around event prediction, annual downtime values, and individual models in a system. This presentation covers hands-on examples, as well as how to use JSL to make the platform and report generation easier.     Auto-generated transcript...   Speaker Transcript Brad Foulkes Thank you for attending my talk on But What Do I Do Now? Using a Reliability Model to Make Event Predictions in JMP.   Very frequently I'm asked...I'm asked once I build a reliability model, what do I actually do with it? And oftentimes people forget that you can use a reliability model to make predictions about when events will actually occur.   And not just the first event even, maybe the second, third or fourth event that that product might see over time.   So a little bit about myself. My name is Brad Foulkes. I'm the director of engineering at Optimal Analytics, where we work with businesses to try to help them understand their data, identify roadblocks, and get them moving more efficiently.   Prior to that role, I worked in reliability engineering for about nine years, working on how can we understand part reliability and figure out when parts will actually break over time.   I've been using JMP for about eight years, I think around JMP 10. I'm not sure, it was a while ago, so I forget what version we were on...I was on then.   In my favorite JMP tools, I like to use the life distribution a lot because that's where the reliability modeling sits. Fit model, I use very frequently and JSL.   Scripting can open up a whole sort of new tools and new toys for you to play with. So why do I want to present this today?   Because, more often than not, this question comes up at the end...at the end of building a model and not really thought about up until that point.   You might finish a model and have a whole bunch of probabilities and then not really know what to do next. So here's kind of one option that you can do as you have built your model and can continue forward.   So what we're talking about today around event prediction, when you build your model, you're used to frequently seeing this kind of a plot here. You'll have your individual times,   and maybe these are hardware or product life cycles. Let's say it's a blender, you know, or any kind of kitchen appliance or small kitchen appliance that maybe doesn't last   a terribly long amount of time. So some of these are going to fail very early; some will fail very late.   But at the end of the day, you end up with a probability of failure, when these things are going to...are going to occur. So here you might come over here and say, okay, well about 90% of the failures are going to occur by four years.   Well, that's great. That helps you to kind of understand a first failure and maybe put a probability of parts failing in the field, but for a person who actually has to buy these things or use them, this doesn't help them a whole lot.   For a customer that's actually using these, they don't care, you know, how long it's going to last. They might not want to know, how many will I need to buy over a period of time,   maybe in an overall lifecycle. They only want to look at a 10 years, how many of these do I need to buy in 10 years for for it to survive.   So that's where event prediction comes in. And what event prediction really is, is taking your distribution and kind of flipping it around, so still having your, you know, your distribution of time.   What are the the time to events that you have here, and then the probability of an event. And so now   you can look at how frequently some of these events occur. So maybe the first event occurs here around eight and a half years, the second event occurs at nine and a half,   the third event...but this is all one customer. And so this is all one customer, one use case, and so, for them, you know, their meantime to failure   looks at all of these parts here, looks at all of these intervals and these replacements. So this here,   is 20 plus years of time that a customer is actually looking at this. They may...so they they're going to want to know...I need for or to survive that 20 years, I'm going to need at least three parts.   And so, with using some of this random selection and simulation, you can put together an understanding of how frequently these parts need to be replaced and how frequently a customer might need to purchase your product.   So, so how is this practical? Alright, so it really comes down to replacements.   And replacements being, just how frequently are you are you putting a new part, or as good as new part, in there? So   a repairable system and a non repairable system. And a repairable system is one where you can get things just up to the   the point that it can run again, and maybe it's not as good as new. So let's say you've got a hole in your tire. You plug the leak, and you keep going.   That's repaired. But what if you could repair the tire to the point that it was as good as new and it's never going to fail again?   Now with the tire, maybe that's not all that practical, but with other things, you might be replacing a part, or maybe you need to cut off a portion of a part and replace just that one or that part that failed.   Getting a part to as good as new can make this a non repairable system. So even though you performed a repair, to make it as good as new,   it changes things a little bit. Now that's a giant caveat, you know.You do have to have a lot of considerations as to what as good as new means, but very frequently, I think part...or people deal with parts that   they've repaired and they repair as good as new and they think, well, this is a repairable system, but for the reliability world, it's not. Alright, so I'm off that soapbox for a bit.   Back to replacements. When we're looking at kind of a single system,   you might be looking at something here and say, okay, you know,   for one single system,   how frequently are parts failing? And so if we're looking at just one customer, how how frequently are they buying them? And so the first part lasted four years here.   Then they had to buy a second one. That one only lasted 1.3 years, then they had to buy a third one, and this one went gangbusters and lasted eight years.   So, at the end of the day, they've had to buy three parts in 12 years, but there's a wide variation. So you see this with with all sorts of products, with some cell phones, with   other home products. Some things will fail early; some things will fail late.   You might just be talking with your friends and say, hey, I've gotten...I have this horrible cell phone. I can't believe it failed this early. And they'll be talking to you about the the same cell phone that they've had for 10 years. Granted probably not 10 years but   it could be a while. So at the end of the day, for this one system, this one customer, they've had to replace a part three times.   Now, if we look at multiple systems, we can build an idea of over time of how frequently do these things occur. So   one customer might have that first system, then the second customer, maybe they had five replacements or six replacements over that...over roughly that same period of time.   So if we were to look at the first system, we might say, okay, we wouldn't expect any failures until at least year four here.   But if we look at the second system, they had a failure in year one. So if we're trying to figure out the average failures or the the average number of events that we might expect in year one or year two,   that first system doesn't tell us much. So this is where, if you don't have the data and...but you have your distribution in your model where the beta and eta for your Weibull or something like that,   you can use simulation to try to lay out how frequently some customers expect events in year 1, 2, 3 and and going out much further than the data that you actually have.   By doing that, by looking at those multiple systems and looking at that simulation you can kind of understand and start to see a trend of how frequently   these parts are going to be replaced year over year. So while this data here may have only been built off of event times that were seven years long,   for for calendar time or for a customer time, there's going to be kind of a spike and and then it will level out to a kind of a flat level of replacements. Now   with a simulation, you're going to see some bouncing, and that's what you see here. If we have a closed form solution to this, we would have a nice straight line, but   with the simulation, we do get some variation. That's what you see here. So you can draw, kind of, a line and see you know, on average,   a person might expect to replace, you know, one-third of these a year. So after three years, you could expect to have a replacement, once it once it gets going. So a few in the beginning and then it kind of levels out.   So let's look at an example of what I mean by this.   We have a table here of data and here, I've got a model. And so this model is just a random Weibull. And so if you're not familiar with the different functions that you can use in JMP,   you can use a random Weibull, a Weibull quantile, Weibull distribution, Weibull density. And these are all functions that you can apply to different distributions.   In this case, we're choosing a random set of points off of a particular distribution. My distribution has a beta of 1.2 and an eta of 3.   So, by choosing these random...random events here,   these are all individual events, and this is what would end up going into your model. What would end up if you were to build a model on part failures, these are the dates that usually end up going in there. What gets lost, though, is how this actually affects the customer. So over time   the customer maybe wants to know how how many of these I might replace in 20, 30, 40 years.   And that's where you need to look at the total system event tims. And this is just the cumulative sum of individual times, so 8.3 is the first, 1.2 is the second and so on.   And so, when you go to the second system, we've got 1.8, 3.9, 5.9, so it's just a cumulative count of the individual times.   The thing to keep in mind here is that when you're trying to kind of figure out an average of all this,   you would be looking at the system number, but then at the year that this actually occurred, so 1.8 years occurred in the second year of operation.   And that's that's a an important distinction to keep...to bear in mind here, because you want to make sure that you're identifying the failures in the appropriate year.   So your 0.6, while it... if that was the first one, it would be in year one, it's really only incrementing the year...the system time...the system years by one here.   So, having this you might think, okay, well, now that we've we've got all this we can count up our number number of events and look at this system, the year and and perform a nice little analysis. So JMP actually has a nice tool that will do all of that for you.   So   in JMP there is a tool called the repairable systems simulation.   And this tool,   I know it...I believe it came up in like JMP 13, maybe JMP 14. It's been around for a few years.   I don't know that it gets nearly as much publicity as it should, because it is an incredibly powerful tool that I think a lot of folks don't don't appreciate here.   So, so what it is, is you're looking at a system here. And so that maybe, or right now, I've got two parts in here, but maybe my system is actually   seven or eight parts, or it's an entire gearbox, or a pump, or something like that. And you have all these parts that work together, that if any one of them fails,   you know, the entire system fails, and so you want to understand what is the event, what are the event times there. So in each one of these, you'd lay out what your Weibulll is or what your model is, and here I've got a beta 1.2 and an alpha of 15.   It's a time unit of years, I can certainly choose different distributions if I...if I would like.   And then I can say what happens if an event occurs. So if a a block...so here I've got a block failure, what happens if a failure actually occurs?   And the outcome is to replace with new, so I just want to replace the part. Now there are many different options here, maybe I could do a minimal repair, but instead I've said, I want to replace this with a brand new part, so this is going to be a like new...like new model.   And then you can include an amount of downtime. So how long to repair this? What was the mean time to repair? It can be a constant value   or it can be...you can say immediately, don't even care about the amount of time it takes or you can give it a list to choose from. You can you give it a couple distributions here as well.   So so with these tools...and now I've built a very simple system here. I've got one part with my known distribution or I've got two parts with known distributions.   When either one of these fail, I have a replacement time, and so the second one has the choices option.   And you can see, you you put it in here. Now the the choices need to be in the same time unit as your simulation.   So I've chosen years to be my simulation. This is in years, and so the time to prepare for something might only be an hour, two hours, maybe it'll be a day. When you put that in times of years, the numbers get really small though, so so that's what you see here.   I'm not going to go into all the options here. As you can see, there are many other things that you can do with a standby, a K out of N, a lot of those are for for more complex analyses.   Alright, so I run my analysis and I'm going to run this for 20 years, which is listed here, with the number of simulations as 100. You will often want more than that, but for demonstration purposes, we're going to set it at 100 and have a seed of 1234.   So then, then I get this nice big big output here, and the initial output is always called number, which is very helpful.   But you might look at this and go okay well, what do I do with any of this stuff? And you can come over here and you can launch this analysis and   maybe you want to look at the total downtime by component. So this says, you know, how...or what is the distribution of downtimes that a a component might see?   But it never actually answers the question of over 20 years, how many times am I going to need to replace this part?   So I wrote a script and so when you...when you download this journal, it will have all these scripts that are in there.   I set it as a button, but the actual script is all right underneath here for for your reference, if you would like to use it.   So I'm just going to click on the button and a whole bunch of things are going to happen. So I've now gone in and calculated how frequently do I expect these events to to occur, year over year. So from the system perspective,   I've subsetted my failure events and set my year value. So so you might recall that the year value rounds up, so 3.8 years is 4.   Now this looks at calendar time, this looks at the the system time that we might be using. It doesn't look at the individual time, so the time between each one of these, you know, for for row one to row two is 1.8 years or so.   But because we're looking at the system, we're actually counting the 5.69.   Now I want to turn that in...I want to turn all these values into something useful.   The important thing to remember, though, is that you need to...when you're looking for an average number of event every year, you need to look at the years that didn't have events occur. So in this first simulation, the first event occurs in year four,   but I need zeros for one, two and three, and that's what my script does. My script kind of goes in sets a   pattern of what it is that I want to look at. So I want to look at part one, I want to look at the year value and the simulation ID, and then look at when the events actually occurred. So here you can see the first part one event occurs in year ten.   And so when that gets added here, that's what gets updated to the analysis. If I scroll down to part two...   when you scroll down to part two for for simulation one,   there...there are no events.   The first event for simulation one isn't until...   until year four. So if I scroll all the way down to year four   for part two, it will show that I have one event in simulation one, alright. So it's just really kind of lining up these events for every possible combination of years that this could occur.   Well, having that level of information, now that I have that across all of the...all of the simulations, you saw I had 4,000 records, now that that's just all the possible permutations of year, simulation, and part.   But really I want to average those down. So I average it down, and I want to look at the number of simulations, which were the number of rows, the year value, and then the part. I get down to 40 rows, far more manageable and far easier to understand.   What you see here for part two is we have that little spike up in the beginning, and then it kind of levels off. And   it bounces around because it's the simulation and we only ran 100 events, but...or 100 simulations, but we have kind of a steady state of of replacements that we might see.   If we were to try to look at this from a cumulative perspective, though, this is what can tell a customer how many they might expect to replace in 10 years. So for part two,   let's say the customer wants to know when will I need to replace, you know, the first one? What's the what's the very first one? The very first one   is going to happen between years four and five, like just after year four. And then the second one happens just...or at about year seven.   The third one happens at about year 10. So you can see, you kind of have this this bit of a climb where not a lot of failures up front, but then there's this steady state and every three years or so you end up needing to to replace the parts.   This can be very helpful for for customer, for planning for your business, or even the supply chain efforts, if you wanted to try to do something like this there.   So using this event prediction from the repairable systems simulation is a handy and quick way to to give that idea or give that understanding of how many parts, not just the probability of failure and when that might occur. So   one thing...one other thing that I want to talk about today is a way to make the repairable systems simulation a little easier.   And so it may be daunting. It's a new tool, you may not know everything that you're you're working with, but   if you...if you start with the template, start with the table, building your repairable systems simulation can be a lot easier.   So in the end, to build your model, you need you need roughly 13 pieces of data. Most of this goes on in the background that you wouldn't even think of, but if you lay these out into a table and a template   with these various columns here... And I say 13, you actually need less. I've I've made some modifications here to handle some neat little things that you could do here.   But really you have your distribution, your model type, your parameters, what time unit. These are all things that are either defaulted in JMP that you can change or   that that are things that you would add in. So each one of these has a, you know, if the block fails, what's your replace with new? And these you can adjust, you can connect these event names and these processes differently.   In in my world, a lot of times we'll see a change of behavior. And maybe we want to understand that the first few years, this is the replacement rate, and then the second few years, here is the replacement rate.   Or you might have competing failure modes that you want to take advantage of. So what I'm going to show you here can handle a lot of that.   Excuse me, this is by no means perfec. This   should definitely be adapted for your own...your own use case, but, at the end of the day, starting with a template is an incredible way to to speed up your development and   to allow you to be able to make adjustments...make small adjustments on the fly. So if I've got this, I've got a nice little script here, where, if I click on the script, it produces my repairable systems simulation for me.   And if I look over here, I've got my first...my first part with a beta 1.86 or 826, an alpha of 153.   And it's really just taking this table and turning it into the repairable systems simulation. You can do   something even more complex. So let me close that one down.   And so here I've got 14 parts. I could change any of these distributions if I want,   but I've got other types of events that I want want to look at here. I want to split my distribution where   it changes or changes shape after a certain number of time. Maybe I'm trying to model a product introduction, you know, product upgrade that is going across the fleet. Or maybe I've got competing failure mode in here and I want to take that into account.   So if I will, you know...using this template in the script, it allows me to go and handle all of these in a very simple fashion. Now you saw how quick that was. What is that, a second or so to   to generate this model? Now if I want to create another one of these, maybe, maybe I want this to be a   log normal.   I can do that right here, and then   come over to my script   and   and just run that again. And I get another model where (let's see, what was it? Part five.)   part five is now a log normal. It's really easy to try different iterations, try different models, and it generates a new one every single time,   labeling the diagram as your your table here. So it's a handy way to perform an analysis and understand your event predictions using this template, using the script that's an embedded in here to perform the analysis, and understand your event predictions over time.
Steven Crist, Analytics Consultant, Wells Fargo   It is well known that optimization of the layout and content of webpages can be achieved through thoughtful pre-test design of experiment (DOE), post-test analysis and identification and productionization of a winning variant webpage.  The present use case demonstrates the use of the JMP custom DOE platform to create a fractional factorial multivariate DOE for a financial services checking account webpage that effectively managed business constraints while providing the necessary data that lead to a 7% increase in application volume as compared to the legacy webpage.  Additionally, leveraging the JMP partition model platform, an additional key insight was discovered that visitors who clicked on the ‘compare accounts’ link were 40% more likely to submit an application.  The ‘compare accounts’ insight was not the main inquiry of the original test but provided guidance for future testing to further optimize the webpage and resulted in an additional 4% lift.  The presented use case demonstrates the effectiveness of the testing continuum of a test leading to actionable insights resulting in the next optimization test and so on.     Auto-generated transcript...   Speaker Transcript Bill Worley hi. Steve C I'm Steve Crist. Thanks for joining me today as we go beyond A/B testing and look at a use case of multivariate testing and advanced analytics for web page optimization.   The outline for today. We'll start with a high level overview of the entire use case then dive into the details. We'll take a look at a JMP demo   to look at how we evaluate a specific design of experiment.   We'll also look at the results we got from the test and how we were able to leverage the JMP partition model to enhance those insights.   From a high level overview, this use case consists of two tests. In test one, we looked at some components of the banner design and the body of the page layout.   And we had successful multivariate test that resulted in a 7% lift in applications. We were able to take that one step further   to uncover an insight for some content that had gotten pushed down at the bottom of the page that we found out was very important and we're able to leverage that in the test two. We're able to   increase our performance on top of the first test winner by an additional 4%. So the use case highlights how the JMP DOE platform and its custom DOE was able to help us enable test one, and then the JMP partition models helped us extract out more   additional insight we were able to leverage into test two.   So let's get into the details.   When we look at our current page, this is for the Wells Fargo checking homepage.   And our marketing partners and business partners have done a great job of getting some voice of customer feedback, so the main motivation for this test was around this Body Style A or Page Layout A   versus B, where our customers had said that they wanted to see more products surface higher up in the experience. And our marketing and business partners said,   while we're...while we're making this change, we also think that this image that we've had for a while, we could do better.   We also think that this banner design, where we bring the content to the left, where people's eyes are more naturally drawn and physically smaller so that we can surface this content that our customers and visitors said they're more interested in. So our partners came to us with an A/B test.   But you can quickly see that there's a lot more going on here. We only have two of the eight possible combinations   covered and what the conversation we had with our partners was that this design looked great, but we have some risk.   And that risk is that some of these components may work well and some of them may detract from performance and cancel each other out. And so, if we've only run an A/B test,   we won't know what components do and don't work well. And there's there...that's where we proposed a multivariate test, which our partner said, that sounds great. Let's do that. But...and they...we had some business constraints that we needed to manage.   So what were those business constraints?   They were,   as I mentioned, the main motivation was around this body layout and and bringing the new content to the page.   And so, when we look at the overall design space, our partners had no interest and no appetite, really, to test any of these other variants that would have the old body page layout. That was   that was a key...key factor number one.   Secondly, there wasn't much appetite...in this block, this represents a page that had the the old image with the new layout, and again there wasn't much interest   in doing that. And so, in this particular case, our partners were very prescriptive about the tests that they were comfortable running.   And so the natural question is is, does this test work as a holistic multivariate test design? And that's where the JMP custom DOE platform comes in. So let me switch over and we'll take a look at how we did that.   So we're going to go to DOE, and we looked at a custom design.   We have three...three factors. We have our image   with an A and a B versions, we have another two level categorical variable of banner with an A and a B,   and lastly, we have our body page layout with an A and a B version.   And   when we click continue,   we're looking good so far. And what we notice here is that we can cover this four cell design space in as little as four cells, which works out well   for us in this situation, but we know from experience, and people who are familiar with this would know, that the   JMP optimization engine will never give you this design as a four cell fractional factorial. This is not the most optimal design   for four cells, but it was the design that we are trying to work with to manage our our business constraints. And so, in order to force this particular design, you can use the disallowed combination   to essentially specify every single cell that we don't want to run, so that we're left with what we do want to run. Within JMP, there...there isn't a way to specify exact test design, but you can leverage the disallowed combinations script to be able to force it.   And this is something that that we do very often,   because in our...in our world doing web page testing, we get a lot of input from our partners. And this is a fairly typical use case for us, and so this is a technique that we use often.   And so, because we're dealing with categorical variables, just to explain the script here, we have to code them, so value of A, gets a 1 and value B gets a 2, so when we look at this particular cell that we don't want to test...   Let me bring that back up.   Wh look at this particular cell that we don't want to run, this is image A with banner B and body style A, so ABA or a 121.   And this cell here is image B with banner A and body style A, or 2211. And so in that manner, you can specify all four of these cells that we don't want to run. So then we click make design.   Takes just a second to render.   The fact that we don't immediately get an error saying, does not converge, is a good sign, and so, in fact, we we...   JMP says yes, this design will work. And so here's our...here's our control page, represented in run number one here, that is an AAA,image A with banner A and body style A. Here in run three is our   variant that our partners came to us with in the original A/B proposal that has the image B with banner B and body B.   And then also we have variants one and two represented here as well, with BAB and AAB, and so we get back what we expected to get back. So I'll comment here for just a moment about the evaluation. One of the things that I typically look at is the color map on correlations   and the design diagnostics.   And for those who are familiar with these, you'll notice that   this is, as I mentioned, not the most optimal design, but it does work and that's where the science and art of this   comes together, that we have to make it work with our partners. And so to the extent that it does work, that was the most important factor for us.   Also, knowing that that we have a fallback, that we have the actual data. We could do some piece wise comparisons, if we want to, but there's a lot of efficiency gain   by running a multivariate that we can infer some of these results. I'll make a comment here too, not the main point of this conversation today,   but a well constructed and properly analyzed multivariate does not take any longer than an A/B test, which is very counterintuitive at face value. So I wanted to make that statement up front. If you're interested, please ask me during the Q&A or reach out to me offline.   So, as we come back over to the presentation.   So we just looked at this design and we were able to evaluate it, we now know that that this works as a holistic fractional factorial multivariate. So what did we learn?   We learned, as we looked in the overview, that   this variant two, we had a 7% lift. That was our highest performing one in this design space, and because we were...had a holistic multivariate test, we're also able to leverage some regression techniques to be able to infer, to do a look back analysis of   what would have happened if we had run some of these pages during this test. So not only did we have the actual values that we could   validate and verify with regression analysis and we had good agreement, but we also...because of that, can also then calculate and look back on how these pages would have performed if we had tested them. And that...that's one of the   powerful things about multivariate testing, especially if you're in a situation like we are, where you need to develop and create all these different pages. We can leverage this technique   to save ourselves quite a bit of work up front, yet still be able to extract the learnings. It also helps isolate   what's driving the performance. And you can see here that if we had done the original A/B test,   we would have...we would have had...   our variant would have suppressed performance. And if we hadn't run a multivariate, the interpretation would have been that this new page that our customers...   or this new concept and design that our customers said that they would like, we would have interpreted that they didn't really like it. But in actuality, you can see that   it's all of the banner B versions that suppressed performance. That was...that new banner style didn't particularly resonate well with our visitors   and our customers.   And so we were able to extract them. Not only did we have a better winner going forward, but we are able to easily understand that this design   that our customers said they wanted, is, in fact, when it gets down to it, what...what performs well.   So we could have just as easily stoppped there, and said, great, we had a successful test. Thank you and we're done. But...but are we done? We have a lot of data. Is there anything else that the data can tell us?   And in our reporting, one of the things that we noticed was that we got a lot less engagement and a lot less in clicks   on this compare accounts and the find the right account, which takes you to a guided experience. And in the control experience, those...   that content was much, much higher up. It was right under the banner, and so, not surprisingly at all from it click perspective,   did we see a lot less clicks on here. That's pretty typical of any web page, that the further down on the page, the more people have to scroll, the less they click on it. That's that's pretty intuitive, but we ask ourselves the question because it was such a stark contrast.   Is that a problem? We think these experiences and content is pretty good, and we're kind of surprised that it dropped that much. We expected it to drop a little bit,   but it dropped a lot more than than we were expecting. Is that a problem? And so to help us understand that aspect of it, we used the JMP partition model, and in particular, the decision tree algorithm.   So we took variant two and we analyzed for application submissions to try to figure out what content, when it was or wasn't clicked,   helped increase. And we saw separation with a higher application submission rate.   And so at the top of the tree, the most important thing, again very intuitive, not at all surprising, is that people who clicked   on some content on the page versus people who visited but didn't click on anything, the people who clicked had a 20 times...20 X higher app submit rate.   Very straightforward. They're motivated. They're engaging with your page. So they clicked, but what did they click on?   The next node of the tree is the apply now and open now content, so people who clicked open now or apply   directly from this page versus people who clicked on something else but didn't click apply, those apply clickers had a seven X higher app submit rate. Again, very straightforward, very intuitive.   But what we were surprised about was if they didn't click apply now, there are a lot of other paths on this page that you can   click on the product details page and apply from there. Or you can go into the compare experience, for example, and apply from there. Like you can go deeper in the experience and still apply. And so the second most important content on this page, other than open now, is the compare all accounts,   (CTA). And we just buried it down at the bottom of the page. And in fact, the compare all accounts is just as important as the content in the banner, which was very surprising to us.   So we're able to have that conversation with our partners to then leverage this insight that when people click compare, they're 40% more likely to submit an application. We need to test to see if we can put that back up higher in the experience and what that does for us in terms of performance.   And what we found, you know, so here in control, we had the compare and the product selected content, both at the bottom.   And my personal proposal was just put them back up both at the top, but luckily our...we'd educated our marketing and business partners enough that they said, well, let's run a multivariate to find out, do we need them both at the top? You know, is there a preference,   in terms of from our visitors, in terms of performance and site conversion? Should we just have one? Is...is some of this contact...content distracting?   We found out that it was. You know, in our best performing one, as we mentioned that the top, was this variant that had the compare at the top   and the selector down at the bottom. You know, the selector, in particular, less people were more   interested in that. And so, by having it at the top, it was a bit distracting. And so we're able to really clean up and optimize our experience by keeping just the compare back up at the top.   So, going back to the to the high level here, you know, this is...we had two tests and we were able to use the JMP platform in the first test   to help us navigate the multivariate testing discussion.   And then use the JMP partition model to uncover that insight and feed that insight forward. And so, again, between those two tests in a fairly short period of time, we went from...we're able to increase our site applications by by 11% all total.   And so, in conclusion, the the multivariate testing,   in general, is a very effective method be able to isolate and understand what's going on with your test. You know, as most people know,   once you start changing too many things, you lose some level of insight there.   And I'll also say that, you know, in financial services testing, one of the key differences here is that   people may...may be asking themselves, why didn't you just do a bunch of sequential A/B tests? And part of that relies in being in financial services,   and I think other industries have this as well, that every change to a page needs to go through a level of legal, risk, and compliance review. And so   for us, the multivariate method is very effective, because we can go through that process once, and sequential A/B testing ends up being very cumbersome, and we can   move through testing much more quickly with with this method. And as it relates to JMP the DOE platform, JMP's DOE platform is really best in class.   It's so visual and it's so impactful to help navigate those conversations with your partners, because,   you know, as an analyst, this is something I do on a daily basis, but it's not something that our partners, marketing and business partners, really think about very often.   And so the JMP platform is instrumental and help us navigate that conversation and articulate why this is the best path forward.   And again, the flexibility of the disallowed combinations script gives you the ability to evaluate specific designs and navigate and manage those business constraints.   And then taking all of that one step further, the partition model and the decision tree analysis helps extract even more insight, so that   one of the things we always strive for is to be on this continuum, where we run a test, we hopefully have some impactful results and learn something.   But that we also learn something that we didn't really appreciate and really, we're looking for to get that insight to know what to test next.   So that concludes the presentation. I want to say thank you to all my colleagues, testing colleagues, at Wells Fargo for all their leadership and collaboration   spanning the the the the range of project management to web page development to our quality assurance team members. So thank you all, and a special thanks to Rudy for his   friendship and being a longtime colleague for his guidance, expertise, and tenacity, and always helping the team strive to do better. Thank you for attending today.
Thor Osborn, Principal Systems Research Analyst, Sandia National Laboratories   Talent acquisition is a critical element of the talent management cycle. Employment application arrival is a stochastic process that poses limitations and delays on the subsequent vetting, decision making, and negotiation steps necessary to hire and onboard talent. Analytical comprehension of this process may be useful for guiding the content of job postings and the expectations of hiring managers, as well as for simulating flows and timing from the issuance of job requisitions to onboarding. This presentation demonstrates that application arrival behavior may be effectively modeled using an ensemble of Gamma-Poisson distributions with mean rate and overdispersion parameter distributions correlated to work site, field of practice, and career stage, using application data from a large research organization collected over several years.      Auto-generated transcript...   Speaker Transcript Christy Spain Good afternoon. hello, how are you today. I'm Okay, how are you. Christy Spain Good thanks. Where are you located Thor. yep Christy Spain. And albuquerque new Mexico. Christy Spain that's what I was thinking. got the right. Christy Spain triple digits out there. Not right now. Okay we've got a. Hurricane coming in from the. ball hogs gonna knock the temperatures down. Christy Spain Okay what's the name of that one. I don't know normally doesn't happen much and I've kind of not been paying much attention. well. Christy Spain I am I kind of caught me off guard, but you know that's hit the golf they're pretty hard yesterday so everybody here is filling up on gas, because the pipeline is now shut down. So where are you. Christy Spain I'm in cary North Carolina headquarters yeah so. um so let's see have you done, did you do any recordings for us last year or. I made my own last year. Possibly one of the reasons why we're doing this. This time. Oh. Christy Spain that's funny. um let me see, I have a little checklist let me open that up like actually close too many things down when. I was trying to clear everything out, so I didn't get any pop ups, have you closed out all your Apps and. I haven't yet, and I also had a question. Sure um. It said that I'm supposed to upload things before my my recording but I can't because I don't have a link. Christy Spain I'm upload is and what what things. And presentation materials. Christy Spain Okay i'll find out what that link is for you. And so I went back and looked through my old emails I couldn't find it I'm thinking well. I'm. Not a big deal but. Christy Spain We should definitely be able to just go ahead and record, though. Okay yeah. Christy Spain Let me find a little checklist but part of that checklist is closing at your Apps. i'll go do that. You don't need Microsoft teams. Christy Spain You don't have anything any logos on your shirt so that's good. No logos. Probably don't need no right now, either. Christy Spain I couldn't figure out how to do not disturb teams. The teams well. One of the things about modern. Technology is that people aren't giving you instructions that's anymore they're giving you this thing and you're supposed to figure it out. Christy Spain Well, I typed in and the help section D amp D and it was acted like I was looking for any messages or anything of that sort, and my Okay, or it was fine I closed it out so. Okay, so. I have a few applications open but. Look at that little parasitic teams thing. Christy Spain we're good so. Well, they put it in this little hidden menu so that it's still there, along with Skype because they want to be able to contact you, no matter what. Christy Spain And you know I think for maybe we should have you took one of the background. It says no copyright infringement, but you have you know a lot of books back there so maybe. Oh well, I do have books back here, I thought that wasn't a big deal because you can't really read them, but all right. Christy Spain I don't know i'd hate to. have to call you back and say I'm going to let it go. yeah I don't want to do it twice. No. I mean I'm sure this will be great fun, but I don't want to do it again. me just get rid of teams. they're all the teams is gone. Okay, so these backgrounds, I got a thing on that. I have to go back to outlook. Christy Spain Or you could just maybe we'll. See, I never used them. A truly never used them. yeah because they analysis to blur. yeah. You know, whenever you're doing stuff to move I see people use them and stuff blurs and they see all the junk that's in their room. See so from. tanya. tanya so new symptom. Here we go so. Those are social tiles you know what social tiles. Virtual backgrounds so blue scatter plot. i'll just do that. Come on. So that's that now the question would be how do I. Use them. let's see. Christy Spain I thought it would be under the little three dots but it's not as. Okay I'm not clear on where the ducks are. Where do you find dots. Christy Spain Well, you know at the bottom, where you have share screen and chat. And all of that there's like a little more. That but it's I don't see any option for change in the background, there. don't either. Christy Spain We go back and forth between so many different. tools here it's hard. it's let's do a quick search. Virtual background under slick camera. Oh. So. And then you have to know where the virtual background is because they didn't say. Which one, you should use or anything like that, no, no, no. Christy Spain Oh, that looks Nice. Oh good because I wasn't sure I was going to be able to do much else OK, so now we have the background and if I move a lot that's working pretty well actually only a little bit of screw up okay. Christy Spain and Did you auto had your taskbar. let's see how do I do that. Christy Spain You go down to your taskbar and right click. Yes, on a place that doesn't have anything on it. No. problem is my taskbar is fully subscribed. Christy Spain mine's not hiding either I don't know why. Is it yeah. taskbar settings here we go. it's Monday automatically hide the test test bar in desktop mode. On. OK, so now it appears to be auto hidden except it's not hidden. Christy Spain Well, if you scroll up it goes away. So the only shows up if. I have something activity goes away good. Christy Spain And then your sound is great. and Christy Spain i'll do you want to practice sharing, so we can check your resolution, and all of that. Well let's see. For sure a specific thing that's going to be a problem. For share the screen I'm going to be showing what you look like. Christy Spain Well, you can go ahead and share your content. yeah, but I want to flip back and forth between slides and the the JMP application so I want to be able to show. My screen. So somehow I ended up. With the screen and I've got you in a small window here. Christy Spain At the top, you can do view and change it to side by side or that's what I have. To turn my camera off because nobody wants to see me anyway. it's all about you today. Show small active speaker video thumbnail video. Okay, that shows you. It shows both of us this shows a grid. And that makes the whole thing really small. And I can't see myself anymore. But I can move this thing. Christy Spain Well, I can see you sad your PowerPoint. Well, so here's a question I got my PowerPoint gets in the regular writing mode. But if I go into presentation view. It gets bigger, but then I have to keep popping back and forth between that and the other view. When you do the video. Are you going to scope in on the parts that are relevant or you're just going to show the screen whatever is coming through. Christy Spain yeah we're just going to show whatever. So, most people do presentation mode and then, if you need to get out of that and pull up your JMP screen, you can do that. and go back does that make sense, and if we end up seeing you know some of your. yeah that looks great. Okay, so this I guess is what we'll do and i'll just escape out and go to JMP where necessary. Christy Spain Do you want to try that real quick, so we can make sure it's the right size whatnot. yeah. Christy Spain looks great. Okay. So it's visible. Christy Spain OK, so now, I need to. confirm your name company and abstract title, please. Name is Thor Osborn. Company is Sandia National Laboratories. The abstract title should be Employment Application Arrival Model for Talent Acquisition Simulation and Management. Christy Spain Perfect. And then you understand this is being recorded for use in Discovery Summit. JMP conference and will be available publicly in the JMP User Community. Do you give permission for that use and recording? Thank you. Christy Spain And I think we're ready, Thor, so I'm going to mute myself and You can go whenever you're ready. The goal is to get go straight through. I'm not gonna say anything or stop you unless something catastrophic happens. How about that? Okay, the goal is to go straight through and how long, am I supposed to have? Christy Spain 30 minutes. Okay, now I think you have a little leeway because they like to leave time at the end of the playback for question and answer so if it goes a little bit over that's okay. Okay, now, before we go there's this little window that says talking and who's talking. Because they shrink the little video thing down. Okay does my screen show the presentation view, or does it show that, with this little insert in the lower right. Christy Spain All I see is the presentation view. Okay, so they're somehow floating this. Christy Spain That's got to be annoying. It's okay, as long as it doesn't get in the way I mean I don't put anything down there. Christy Spain that's okay. yeah. All right. Christy Spain Okay I'm gonna go on mute and then you'll be ready. Okay. So, my name is Thor Osborn. I work at Sandia National Laboratories as a principal systems research analyst. My talk today is Employment Application Arrival Model for Talent Acquisition Simulation and Management. And I hope you enjoy it. I've got three basic objectives here. One is to make the business case for why you'd want to model employment application arrival times. Another is to show a straightforward process for creating a concise, broadly applicable model using this kind of data. And then I'll demonstrate that process on data from a large research organization and show what you can do with that, briefly. Technically i'll go over understanding the data because I think this kind of data is a little unusual for most folks. The source models, essentially, this is a source modeling analysis, so i'll be talking about the Poisson and Gamma-Poisson models. i'll do some analysis and some transformation to make this easier to work with and then briefly touch on how you might go about making models. So, as far as motivations go there are three basic areas of motivation. One is to improve the talent acquisition process and business function. Another is to improve the understanding of executives leading the company, and then the third is to help frame expectations for hiring managers. i'll start with talent acquisition. The hire rate and lag depends on flows through vetting stages in the talent acquisition pipeline. But the critical thing is, you have to have sufficiency of employment applications before you can go anywhere with the process. And the rates and patterns vary quite a bit depending upon the field, the specificity of the postings that are put out, the competition, how much you advertised, and so forth. A common mathematical framework for application arrival could enable better understanding of the trade space for improving application capture rates, as well as other things. Some key relationships application capture rate and variance versus employment context, job site, career level, field of practice. Capture rate impacts of adjustable variables advertisement job posting specificity, the posting language--and by that I don't mean English versus Spanish, but rather the way things are framed. And targeted recruiting efforts. And then the capture rate impacts of external factorsthings like economic conditions, competition in the field of practice, how big is the professional population that you're really reaching within your recruiting area. And I think it's important to recognize that talent acquisition faces the typical constraint that you often see with project or program execution--time, quality, and cost. It takes time to collect applications. You want to get a high quality of applicants. That costs money, and the faster you try to go the more it's going to cost. So there are trade offs here. Executives often get involved in the process through workforce planning, which is a catch all term but, basically, you know say an annual plan. But absent relevant models and feedback on the cost of things, the consideration of the typical constraint and allocation of staffing budget or processes could be subjective or absent. And by that I mean at the executive level, HR is often viewed as simply a necessary cost, but not necessarily a lever for moving the company forward. Partly due to lack of information, so providing information could be a way of improving that situation. Also, hiring manager expectations. Basically we're dealing with small number statistics, small numbers of applications. Human beings are really subject to pattern bias, and most hiring managers don't hire people every week, so they're not used to looking at this from the perspective of a lot of different cases over a long period of time. And that can lead to anchoring in the last situation they faced, that can lead to pattern bias where they look at the number of applicants heading down from one week to the next and figure that must mean that they've accessed everyone who's interested. And the problem with these intuitive responses to biases that can lead to overreactions because, as always with small number statistics, you get misinformation if you don't take it in that context. So a bit about the data. Arrival data are tied to specific job requisitions. So, you have a job requisition, there's a posting for that requisition, and the applications arriving in consequence of the posting. They're characterized in several wayssite location; career phase; visibility, how broadly it's made visible; field of practice, what you're doing; and specific requirements that can be very broad or very narrow within those fields. And the applications can be submitted during the window of time when a job posting is accessible, so there's a defined time frame usually. They're tracked by date. Now, in this case, using real data for the analysis later I have to use the date of last submitted application as the posting closure date, but it's not necessarily true. But, as often happens in real data, we don't have all the data we'd like to have. And i'll say finally that you're counting the number of applications per day. When you see zero applications for a day that's a count of zero. So thinking of a national pool of potential applicants for a job posting, employer puts out information into the world. Some of the people in the national pool of potential applicants make applications. it's generally assumed that that will be few relative to the total who could. And so you have this vast applicant pool. It's a source and any applications or minor perturbation on that source. And, just to be very explicit, which this could be very boring for people who are used to dealing with Poisson mathematics, but if you see one application arrival on day one, that's one instance of a count of one. Day two there's no application arrival. That's an instance of a count of zero. And so on the far right, you see three instances of zero, three instances of one, a two and a three. And that can be fit, although it's not very much data, it can be fit with the Poisson model, and you see the blue line there in the lower right. So an average rate of one application arrival per day, but the actual count per day is going to vary across the distribution. And that's it. So, the Poisson distribution can be used to describe probability of a count produced in a unit of time by randomly emitting source of discrete items that have some constant mean rate of emission Lambda per unit time. And you get this equation. And, so in the case of a count of four, which showed a very small probability in the previous slide, you essentially have one to the fourth. times e to the minus Lambda, so basically 37% divided by four factorial, or 24, and so something in the neighborhood of one and a half percent, which jives with a small but but nonzero probability. So, as a first hypothesis we'd assume that members of the nationally distributed pool of potential applicants for this broadly advertised job act in a uncoordinated manner regarding employment opportunities, and so a Poisson model could make sense. And so it's a reasonable initial hypothesis. Now, what if they do interact. Then, that can be conceived in terms of the Gamma-Poisson source model, which is essentially a Poisson components using a Gamma distribution as a mixing distribution. Or you can think of it as a blurred out Poisson because the rate, the average rate, is not a constant average rate, but it changes from one moment to the next. And this can happen under an alternative hypothesis that applicants behave in a coordinated manner, which could happen if there were networks of people talking to each other about job opportunities and so forth. Now here is a slide on how to generate the Gamma-Poisson. This is basic stuff, but I find looking at these distributions can be helpful. So I'm going to break out and do a little DEMO in JMP. Now what you see here, random Gamma four-one, just means random Gamma with a parameter of four and a scaling parameter of one, so average is four. You look at the distribution. There's 100,000 rows here; you get a curve like you saw on the slide. Average is about four, which happens when you do something lots of times. it's almost exactly what it's supposed to be. Now, take that distribution and use it as the feed for a random Poisson. And you get this. Which doesn't look like the Poisson from before because it's been smeared out. If you look at a Poisson fit, you'll find that the best fit for four as an average, which is all it can really do, has a narrower dispersion than the Gamma-Poisson and just to verify that yes, it's a Gamma-Poisson, look at the red line there. And it fits really, really well. Almost perfectly. And so, essentially, this is, this is a case where having gone through that communication process, if you will, the Poisson is no longer the best model. The case I just showed is shown here in the upper right of this quad panel. Sigma equals two, the over dispersion parameter of two. If you have an over dispersion parameter of one, Sigma equals one, on the upper left, then that's essentially the same as the Poisson distribution all over again. But as the over dispersion parameter increases, goes from a hump followed by a tail, instead to something that looks more like a decaying exponential. You'll see that in the lower right in the blue, the colors are switched here opposite what they were before, so if there is a confusion on that, sorry. Now here's with real data. So the case of broadly accessible, early career mechanical engineering posting. 140 applications to create 131 counts, many of which are zero, so this is spread out over time--one hundred and thirty-one days. The Gamma-Poisson fits best, and you see that in the upper left. I'm using Akaike's criterion, the AIC metric. In the lower left, you will see an experienced position. Now experienced professionals don't have the the networks of, say, university placement folks and so forth. Or necessarily the constant immersion with other people that you would see in a university setting. This is not proof of anything but just a way of rationalizing that perhaps that's why you see a more classic Poisson behavior with experienced professional responses typically. And just to show here, it seems to correlate with the number of applications as well. Other factors that would matter seem to be the career stage. As I just mentioned FLSA Status (Non-Exempt) meaning, say, technologists as opposed to exempt staff. Those factors seem to matter, but there's definitely a bias as the application count increases, becomes more likely to be Gamma-Poisson distributed. Now, how this data is prepared for analysis. Application dates by job req. Tabulated setting zeros for when there is no count. And then fitting the Poisson and Gamma-Poisson for each req to find out which one gives you the better AIC value. In parallel, requisition summary table was made with one line, or one row, per requisition, and then those are fused together to give parameters for the Poisson and Gamma-Poisson, as well as the AIC parameter that tells you which one is more likely a better fit. And you get a summary for Poisson and a summary for Gamma-Poisson after subsetting, because these are difficult to deal with together. You can use the Gamma-Poisson with a sigma of one, but then you end up with a zero inflation, and it becomes complicated to deal with, so instead I'm just treating them separately. Whichever subset is used, the process will show fits linear model for the parameter versus the drivers. The parameter is normalized by the model. And then I look to see if a common distribution is plausible across all the driver conditions, which in this case is essentially the requisition context of site location, early or late career, and so forth. If it's possible that they fall in the same distribution, then it's possible to create a common distribution parameter set and use that for the entire data set. Otherwise, the approach fails and I have, at this point, no mechanism to deal with it, so fortuitously that didn't happen, but it is something to think about for the future. So here's some Poisson parameter distribution by context where context has to do with the site, whether it's an early or late career, or established you could say, and whether it's, well, which field of practice. Every one of these has a "B" meaning broad because analyzing internal only job requisitions is a difficult, different thing with much more constrained opportunity, and so I'm sticking with simply the broad case in this presentation. What you can see is the mean variance differ by requisition contexts pretty obviously, and no, they don't fall within the same distribution. You can demonstrate that easily. Looking at a smaller subset, just the initial chunk, the Site A early set, it becomes more obvious that the means and the standard deviations are all over the map. Now here I'm going to also do a little demo. So I have this data. Now, this is a really simple model. I'm going to take the Poisson parameter output, and I'm going to use requisition context, which is this multi-level subjective variable. And run that, and you see that there's an RSquare of about point four. That means that it's not insignificant. The model fit is actually pretty good. And the fit for this parameter is very good. It doesn't cover all the variance, but that's okay because we don't expect it to; it's just saying that this is a pretty good model for what it does. It could also be done purely as a tabulation, but I like this approach because it makes it easy to work with, and it gives me a sense of how much variability is being dealt with in this fit. It can go here, and look at the actual by predicted, and you see, yeah, it's not a great model, but it does serve a purpose. And another thing you can do is save the prediction formula. And so you see here. Because I've already done this, this is subscripted to automatically. You see that what this amounts to is essentially just a constant plus an offset by requisition context. There's no rocket surgery involved. But now, if I want to do a normalization, add a column. Call this Norm Poisson Again because I've already done it once for the purposes of making this table in the first place. Find Poisson. There's my Poisson parameter. Divide that by the model fit. That was actually supposed to work. Not sure what's going on here. Okay, never do demos live that's the rule, but this has always worked in the past, and I'm really not sure what's going on. There's my Poisson parameter. There's my prediction formula. Oh. You should never try to divide something by itself. That's why it didn't work. It's being stubborn now. Okay. Sometimes subtle things matter. So here's the normalized Poisson. And what's interesting about that is that you can fit this. And from the available bottles you see that the Johnson SL fits the best and does a pretty good job. And you can also do Fit Y by X. Take this output and fit by requisition context. And get this Oneway plot. That looks a lot more regular than the previous one that I showed. Much more plausible that these have the same distribution. You will see a larger range when on the ones that have more data but that's not, that's not surprising, really. And if you look at the unequal variances test, you can see that there's no indication that they unequal variances. That's not proof that they don't, but it's a way of saying that it's reasonable to use a model where you have assumed that they have the same distribution. So. Back to this. Oh, I also did a Kolmogorov–Smirnov test of each context versus all of the rest of them, and of so 78 cases, only one of those showed a p-value of less than .05. P-values don't prove anything, but what it does give an indication of is that it's reasonable to treat these just coming from the same distribution. And here's just the blown up version for that one smaller subset just before, but you can see that this is plausible; it's not proof, but it's plausible. And with the goodness of fit test again, it seems reasonable that the Johnson SL is a reasonable model to use here to describe the Poisson parameter distribution, the normalized Poisson parameter distribution, for this dataset. This same thing can be done for the Gamma-Poisson and that subset of data, and that Lambda parameter, which is the equivalent parameter. And once again that works out. For doing the Sigma parameter I actually use a Sigma minus one because Sigma is on a baseline of one. And the model here is even lower quality, if you will, but it's necessary to use separate modeling step because Lambda correlates to Sigma, and so you want to be able to have that effect modeled within it. Again, you get a decent fit. In this case it's the Johnson Su. And so here's what's going on. We generate synthetic random distribution parameters. From a data subset, generate a linear model based on context. Normalize a parameter distribution and fit it to a common parametric continuous model. And then to generate the synthetic parameter obtain a random number from the normalized parameter distribution and multiply that by the appropriate linear model outcome, which is to say, to "de-normalize" it. Now evaluating synthetic random model parameters, I used again the KS test, and you can see that there's no way of really telling them apart, so at least it is a way of saying that it looks believable compared to the real cases. And I also made a composite model. Turned that into a function so it can be a callable function that would then, and this is in JSL, callable function in JMP Scripting Language for generating parameters for random job requisition with the context as the input, all the different context features. Now, not going to go into all that detail, but the point is that you can create a visualization using synthetic random parameter pairs that gives you a sense of not only where things typically are in this heat map but also kind of an idea of the outliers what's a plausible outlier and how far does it really go. And you can also see if you do a correlation that the correlation between Lambda and Sigma and between synthetic Lambda and synthetic Sigma are almost the same. It's modeling that relationship reasonably closely. And then, if you look at this with the real data thrown in, which is little black dots in the main graph. Ninety-eight percent of the synthetic density is in the reddish region indicated by the cross hatch, which is the middle inset with a green square around it. All of the data points, of which there are 43, fall there. And so. This again is the synthetic is modeling what really happens reasonably well. It's believable but it shows you what could happen in that other few percent of cases. So that's as far as I've taken the modeling, but what you can see is that you can use context; you can use other variables like how much specificity is put into the language for the job posting that's going to make fewer people qualified, is going to maybe scare off some people. You can look at the way that the language is crafted. Nowadays, people are using tools in HR to craft language that is more acceptable. You can see how much difference that makes in terms of what you get over time and multiple job requisitions. You could treat these as factors for the model, and then by that you could tune how you apply these features in creating requisitions depending upon what you need. But now I'm going to switch into, okay, what does this look like again. Again basic principle, so for a case of a Poisson with seven per week, 30% of the time, the count will be five or fewer. So you should expect one of those is that you should expect that the number of applications require time required to obtain a reasonably competitive selection for hire is going to vary, because even with an average of seven the number that you get is going to vary quite a bit. The variance of the Poisson is the same as the parameter, and so you get a broad variation. And so, if you're used to thinking in terms of long term averages or anchored by another case you could be thrown off by any specific instance. And the problem is this pattern recognition bias. The clustering illusion is this tendency to consider that the inevitable streaks or clusters arriving in small samples means that there's a nonrandom effect going on, there's some kind of intent to it. And it's clearly irrelevant for Poisson distributed data as R.D. Clarke found in 1946 with his analysis of German V-bomb. sites falling on the London area. People would see the groupings occurred, and they would think that that meant something, but he was able to demonstrate that it was pretty well satisfied by looking at the process as Poisson distributed, meaning that you're going to get clusters anyway sometimes, and certainly about as often as you saw. So that meant really what was happening was not that they were aiming for anything in particular, but that they were aiming in general for London and they kind of generally hit somewhere in London. In this case here, looking at jobs, the likelihood of getting a sequence over a span of three weeks, where you get a decreasing or increasing count is about 12%. In this example, the likelihood of getting a declining two week count, which is to say one's bigger than the other, is 45%. These are really short patterns; they shouldn't be thought of as meaning anything, but they can be thought of that way by people who aren't aware of the nature of the statistics. That's really the point here. And that can lead to overreactions. Now the variability by field of practice is about an order of magnitude. Some fields are harder to source than others, so people's expectations can be skewed if they don't understand those differences. And if you look at requisitions in general, not just by field of practice, you can see about a factor of 25 difference in the 95% range from 2.5 percent up to 97.5 percent. It's still a huge variability in the rates. And then if you understood how this was being affected by how the language of the posting was crafted, how narrowly, how generally, what the language is attracting, advertising all these factors, then you could decide how much you want to put into making those those rates higher and getting the process done faster. So here's a case for a specific job posting using that simulation model that I talked about earlier. Established professional at Site A and discipline field of practice 9, again this is all proprietary stuff, so it's not going to be disclosed as to what that is; that's just a particular kind of professional. You get this very broad difference from across the 95% confidence, somewhere between zero and nine with a median of two. This can throw people, this small numbers statistics can throw people, because they expect things to be more regular that. There's limitations of the model and the approach. Always the data, the quality of the data, matters. Infrequently hired fields with rare skill sets are going to be essentially missing from the data set in all likelihood if data collected over a short time frame. But on the other hand, a lot of the external variables that you don't have control over may be changing during the timeframe of data collection if you use a long timeframe as the basis for these analyses. And the modeling approach does not consider self cannibalization. If you have two requisitions open within a field, and do applicants apply to both or do they pick one. Maybe in cases they pick the one that they feel is the best chance. This model doesn't do anything about that but would represent the real world outcome just by capturing what happens. Doesn't represent the scope of opportunity missed, though, because if you'd probably like them to apply to all of them that they could be qualified for just to see where they might fit. So conclusions. Some employment application response to a job posting tends to be distributed as Poisson or Gamma-Poisson. And this was checked over a large data set consisting of about 2,500 total requisitions. The distribution parameters for the application response varies substantially. If normalized they can be fit to common profiles. And those concise models that result from the common profiles facilitate generation of synthetic random requisition models. This would allow a person to do some kind of scoping or analysis on likelihoods over different circumstances. I think it's important to mention that application arrival models can fill an important gap for understanding the complete employee lifecycle because they'll provide perspective for hiring managers and staffing professionals regarding those pattern biases. They can be used in discrete event or agent based models as a way of generating applicants if you want to do that in a way that's realistic. And perhaps most importantly, from an economic perspective, it's a way of framing cost per applicant versus characteristics of the job and various other adjustable and out of one's control variables external factors. That, then, would give a better understanding of what to expect in terms of how quickly things could be filled, how quickly openings could be filled, and how much it would cost to do that depending upon relationships with advertising and so forth. That concludes what I meant to present today and i'll just say that if you have any questions you should have contact information from this video, or where it's placed, and feel free to reach out. Christy Spain Okay, that was a great. yeah except for the part where my thing didn't work because I screwed up the DEMO. But that was a good, safe, it was a good SAVE I finally realized what I was doing. yeah JMP will not do stupid things, no matter how much you want it to.  
Lavada Blanton, Business Analytics Student, Oklahoma State University   As the challenges of COVID-19 made more businesses switch to virtual resources, the film industry was no exception. While billions are staying indoors and taking socially distanced precautions, production companies must decide the risks involved in the traditional movie release in theaters. Alternatives such as Netflix, Amazon Prime Video, and HBO Max were once frowned upon by critics but now have become respected players in the film industry. We now know that some movies released during COVID-19 were less than critically favorable but recent hits such as Godzilla vs Kong and Soul have given a glimmer of hope to decision makers. This project analyzed 1,605 movies, released through either the traditional movie theater format or through a streaming service since 2010. We look at key indicators in box office success such as IMDB score, critic reviews and audience reviews to evaluate the best “Movie Mix” to be distributed either in theaters or on a streaming service. In order to compare distribution type, box office success in streaming movies is predicted based on a theatrical release model. Then box office revenue is compared between streaming and theatrical to profile movies based on categories such as Genre and Release Month. Finally, a decision tree is used to streamline recommendations. These recommendations can be used by production companies in formulating the optimum release strategy and resource allocation.     Auto-generated transcript...   Speaker Transcript Mike Anderson We are now officially recording.   All right, you understand this is being recorded for use in the jump discovery summit conference and will be available publicly in the jump user community do you give permission for this recording and use.   Yes.   Hello, my name is Lavada Blanton. And, I am a graduate student at the Oklahoma State University's Masters in Business Analytics and Data Science program. My project is called Netflix or AMC   Predicting Release Strategies in the Age of Options.   First to go a little bit introduction of what my project is about.   Covid-19 cause production companies to decide the risks involved in the traditional movie releases in theaters.   Streaming services such as HBO, Netflix and Amazon were once frowned upon by critics, but now have become a respected key player in the film industry.   This project looks at key indicators and box office success, such as IMDB score, gross revenue, and critic reviews to evaluate the best movie mix to be distributed, either in theaters or on a streaming service.   And the objective for this project is to predict whether movies, should be distributed through streaming services or box office. Success was measured by an IMDB score of 5.5 or higher, which is above average.   I sampled 1913 movie titles released between the year 2010 and 2021.   I had various attributes, such as genre, premiere date, duration budget, theatre box office revenue, and content rating, which were used to determine the best movie mix on the respective release types.   As you can see right here, there were far less streaming movies than theatrical movies. To counteract that I did reduce it to 5.5 or higher for IMDB scores, and that evened it out a little bit.   Next, I will go through my approach. First, I did data preprocessing and data collection. This involves gathering data from IMDB, Rotten Tomatoes, and the numbers that come.   Next I filtered movies released before 2010 and took those out of the data set. And, I created a Covid flag for movies released between March 2019 and March 2021.   And as you can see underneath here, there were some transformations specifically in duration, the pre- and post-transformation is shown here.   Next, I did a box office prediction for streaming movies. This was produced with a neural network with a sample of theatrical movies in JMP Pro.   Finally, I did a decision tree. This was filtered out. The data set was filtered out with movies from 5.5 and higher only. This reduced the data set to 1400. And, with the sample I was able to use a target of either streaming or theatrical for a categorical decision tree.   The use cases for this data is an example in 2019   the global film industry is worth around $136 billion. This means that this is a lot of risk and reward involved in every aspect of how a movie is made.   And these movie mixes could be used by executives in top production companies, streaming services, or even theaters to make informed decisions about future movies.   Below is a list of the top five highest grossing movies of all time. And, as you could see   they are all 7.8 or higher. So, this cut does kind of show that these high IMDB scores do translate to box office.   Okay next i'm going to dig down deeper into the models that I used in JMP Pro.   First, I did a neural network. This neural network was created to predict box office sales in streaming movies. As you can see, we had an R square .87.   And this is fairly good for the neural network. Next I did a stepwise variable selection. This was used to choose what variables would be more most suitable for the recommendations.   Finally, I did a bootstrap forest to create the final recommendations with the target being distribution type.   Finally, I have my recommendations. I have two movies, based on streaming, and two categories, based on theatrical.   First, I have Cheap Feel-Good Comedies. These have an average duration of about 91 minutes. They're equal comedy and drama, as well as rated R.   And then have an average budget of $2.1 million. Next, I have Big Budget Thrills. We have an average duration of 147 minutes. A genres   50/50 Adventure/Horror, and, 50/50 R or PG-13. We have a budget of $146.5 million. Next, for theatrical I have Biographical Reenactments on a Budget.   This is an average duration of 97 minutes. The genre is 41% biography and 35% adventure. We have majority PG.   And also, we have a budget of $8.6 million. Then, finally, we have the Adventures for All. This is an average duration of a whopping 127 six minutes. 73% are Adventure.   And then we have 71% as PG-13 and a budget of $10.3 million. Thank you so much for watching my presentation. Do you have any questions?
Monday, October 4, 2021
Kevin Doran, Senior Staff Engineer, Intel Corporation   Data collection teams within companies amass amazingly large amounts of information. And while these data are very valuable, it is not often tranlated into a format that is usuable by the recipient. Further compounding the problem is that different business units consume the information in different ways, which, unfortunately, often meants that valuable information is never consumed or utilized. How does one best translate the needed information into an easy-to-use format? Furthermore, can statistically significant information be communicated through one interface? This paper describles the creation of one multi-tabbed, filterable dashboard to communicate computer laptop systems data to both architecture and engineering organizations.       Auto-generated transcript...   Speaker Transcript Mike Anderson Okay recording is now started you understand that this is being recorded for use in the JMP discovery conference and will be available publicly in the JMP user community do you give permission for this recording and use.   Yes, I do.   Hello, my name is Kevin Doran, and I am a systems engineer for the mainstream laptop computer segment for Intel Corporation.   And i'd like to share with you today my journey that resulted in a filterable, multi-tabbed dashboard that has changed the way that I communicate with my organization and how JMP was an incredibly important part of that journey.   So, to begin with, if I can leave you with with one message that you take away from today, and that is a filterable multi-tabbed JMP dashboard   is an excellent way to communicate an amazing amount of information   in a clear and efficient manner. So, as you work on your projects in the future, I really want to encourage you to think about using a multi-tabbed dashboard. To begin with, at the beginning, at the start of your project,   rather than oftentimes people think about dashboard and using it at the end of the project, or maybe a last option. This is something I really think that can really affect how you communicate and make it much more effective.   So many of us have the job of architect in the future, whether that's defining or designing products and services   to meet the perceived needs and wants of future customers. That you know what progress is my customer trying to make, and how can I fulfill that? And so oftentimes when we architect the future we want to look to the future. We're forward looking.   The challenge with the future is that it hasn't happened yet, and there's no data there to support any type of decisions you want to make.   So, since we don't have that information and we love looking at data as engineers and scientists, we then look to the past.   The challenge with the past is while it has data, it means that it's actually already happened and it's and it's finished with. So, in many and various means we try to build bridges between the past   to the future, because I think it's really important for us to understand the past in order to develop a better future.   I do you think that probably most people would agree that not all bridges are built the same. This is a picture of the Tacoma Narrows bridge back in 1940.   It comes to a very dramatic ending. So, if you'd like to go look up Tacoma Narrows Bridge in 1940 on YouTube you can kind of see the the ending of how this all finishes up. But,   the reason why I bring up the bridge analogy and apply it towards JMP is the fact that this bridge had a solid foundation to it.   And I think the foundation of a bridge as your JMP data table. You have to have that that solid data that can that complete data there in order to set your foundation. If you don't have that, you don't have a strong bridge to begin with.   I didn't think of the bridge structure itself as the JMP user interface that you make. In this case the bridge looked beautiful,   until the wind started to blow. And so, as you build your user interface, you may have a certain way of wanting to have that   user interface being used or manipulated, it may not always happen that way. There might be people who have different usage models and you want to try and build something robust not like the Tacoma Narrows. Everything was great to the wind started to blow and then bad things happened.   And then, finally, the transit across the bridge to me is how many people and how easy it is for people, then, to use your interface.   So, this didn't work out very well for them, but what I wanted to do with my organization was to build a bridge that look like this, something that had a strong foundation.   Something that had a very robust user interface, and something that was very easy to traverse. So, previous in my organization, we would use,   you know, best known methods. We would try and look at different computer board files. We would try and make our best guesses based on experience.   What I wanted to actually do is bring a user interface, that we can make statistically significant decisions,   and have that information searchable across multiple segments and filterable by important metrics. So, that would make not only having a firm database, but then also a solid bridge and something that was easy to use, so this was the goal of my journey.   So, I wanted to start with the past database bridge. Now, this was a bridge that was built a number of years ago. It met the needs at the time.   But, after several re-orgs, it no longer met the needs of my organization. The thing that's important here is that, while the interface was not very helpful anymore.   The actual data, the foundation, the data table that was behind this was very useful and it was very important. And so, I had a strong foundation to work with. My goal, then was, how do I build that new bridge, that new user interface into my organization and make it helpful for them?   So I'd like to take you a little bit on that journey and give you an idea on on the path that I took to make sure that people can consume my information both easily and accurately.   So I had this this database. I had my strong foundation, and then I started letting folks know in my organization that had this information.   As the mainstream system engineers or any way I can help them and set up some year over year visions.   And so, at that point, I started getting some questions and I went into JMP I create some tables or graphs or different types of reports.   And I'd answer their questions. And, then what would happen is those people would come back with some more questions and the new people would ask me questions.   And the challenge I ran into is the fact that I had so many snips that I was sending out, I felt like I was snip boy half the day and I go in and answer all these questions and send information out more would come in. And it just was   an untenable situation for me to sustain so I had to find a different way to communicate this information.   The answer to that for me was learning dashboarding.   I do think that dashboarding in general, my opinion seems to be an underutilized platform within JMP. So, again, if you haven't read into dashboarding please look into that.   But what I was able to do was learn dashboarding and put all of these pictures into one interface that I can provide to people. So, I was getting questions...I'll give an example of a display.   Hey, Kevin can you go ahead and tell me the screen sizes and the display resolutions in your mainstream segment from a year over year perspective.   And so, eventually, what started as snips turned into a dashboard. So, I was able to give information out about display and answer people's questions that way.   But,   the challenge with the dashboard as I had created it is it didn't allow any further manipulation. So, it was a static display and then people start coming back saying well, what about this case and what about this situation.   And what I really needed to do at that point was provide a little more interactive feel for my dashboard.   And this actually was something that was very critical in my journey that I really want to share with you, and that is   You know first thing I did was add a local data filter. So, I think most people are very familiar with that and I was able to then have people manipulate the dashboard.   But something else really critical happened here and I really wanted to kind of impress this idea, and that is the fact that the data table that I had, that foundation,   had more than just my information in it. It had information in it about entry notebooks and high performance notebooks and premium notebooks and things like that.   And what I realized at that point is hey I can not only provide this dashboard,   not only for my mainstream segment, but I can then increase my scope or my purview,   increase my influence, and my networking, if I also provided this information for all of those other computer segments. So, at this point is a really important piece that   I took and subset at all the other segments, and then added that in as a filterable option in my dashboard. So, this was a really important piece that when you're going through this. If there's something in your data table   that allows you to expand your influence, I really want you to take a look at that and maybe use dashboarding and to help communicate that that expanded role.   Now, I had that and that was going really well. The challenge, then at that point was I was still confined to one dashboard, one topic. All of this was basically around the display platform attribute that I was, I was addressing.   And so, I then started getting questions about hey Kevin, can you tell me about RAM or what about graphics or what about platform mechanical metrics?   And so I thought, well, I can go ahead and take this display dashboard and then create that same infrastructure for all of my other platform metrics that I was being asked about.   At that point, then I had created multiple dashboards so when people would want an update on information,   I would end up sending them four to five JMP files and say hey all the information is here. You just need to open up four or five different files.   You need to go in and filter each one and good luck with that sort of thing and and that created another problem, because I was not making it easy   for my customer at all. I was sending out all these files, I had to go in and out and that just wasn't very easy.   So, at that point, I was at the end of my JMP knowledge, and so I contacted JMP support and gave them the idea of what I wanted to do. I really wanted to take all these multiple dashboards that were on my screen   and actually bring them together into one multi-tabbed dashboard and, to my knowledge, this has not been done before. And, thankfully through   the help of JMP I was able to go ahead and receive a JMP script that actually gave me that capability. And, now I'm able, through all of this, to try and send out one file   that gives people all the answers or hopefully gives them all the information that they need.   So that was really the piece of this. And the reason why I wanted to share this journey is to hopefully save you months of time that you can immediately go to that end state that is very effective and very efficient.   So, I do want to show you what the JMP script looks like I will be the first to admit that I am a script hacker. [self-deprecating laughter] I'm not a coder.   But, I can do a good job making tables and making graphs and then I save the script to the data table or I save it to a script window and I append them all together.   So, at this point, I really want to give a big shout out to Nick Holmes at JMP support he's the one who I gave him my problem statement we work through several iterations of this.   And, he came up with this script that has done an amazing job for me. And so, what I want to do to make it easy for you as well, and help you in your role,   is not only upload my presentation, but i'm going to also upload this script for you as well as long as you don't have any you know variable word   conflicts or anything else, like that you should have multiple scripts or multiple dashboards on your table   open and you should be able to run the script and it will combine it all for you, without needing to do anything without needing to change any of this code. So I hope that can be very helpful for you.   So I have wanted to then show you now that I had the script, I want to show you at least a screenshot of my dashboard.   This was able again to transform a lot of detailed information behind it. And, what I was able to do here on this very simple database or a simple interface   was I could segregate by segments. So, that's up in the top left that was the increase in my scope and my influence. So, you can go ahead and filter by mainstream or entries are high performance systems.   Then I had a bunch of filter level metrics that were important to my organization.   And then the real value of this is on the top, I have a tab by attribute I have five tabs up here right now that showed display RAM, graphics, CPU and platform information. And, all the user has to do is scroll through the tabs or scroll through the   filters and get an amazing amount of customization and very detailed data all from this this interface.   And, I think that there's a real beauty in simplicity here. Because something that looks I would say relatively this simple,   actually, has over 1 million rows of data behind it. So, I think again that's part of the beauty here is that I can take something that I think created, it was created fairly robustly,   and be able to provide a whole variety of scenarios and information to my user all through something that looks very simple to use. And I think that's something that's really important is to make it easy to use, for your for your customer and your organization.   So, I spend enough time in PowerPoint it's time to go the dashboard. I think that there's a lot of value to watching people on these these webinars to actually manipulate some of the JMP interface   and go through push buttons. I see a lot of value in that. So, I want to spend the rest of my time actually in JMP and showing you what I did.   I do want to start with the folks that have not dashboarded before. There are plenty of videos that actually show you how to go through that. I at least want to show you   a very brief example how I did it and least kind of give you an idea if you have not dashboarded before. So,   I have here four reports open. One table and three graph builders. And, what I want to now do is just put that into a dashboard. So, all you have to do is come to your data table,   and say File, New, Dashboard. And, it will come up with a number of different templates. Most of my stuff that i've done so far is just a two by two template.   You can pick any template or you can actually pick a template and modify it. So, what i'm going to do is i'm going to actually pick my two by two dashboard.   And you'll see I have my four reports and it's literally just a drag and drop. I'm going to go ahead and bring in my different reports.   Just into those boxes.   And now I have basically the template of my dashboard put together. Now I mentioned earlier that I also have a local data filter associated with this. And so, you'll see that there are these little thin lines around these different boxes here, and you can drag and drop different   boxes in here, and what I'm going to do is I'm actually going to drop the local data filter on as far left as I can go. Meaning that it will affect the   all four of these boxes. So, you can see that this local data filter will now apply to all four of these.   And I simply have to go to the hotspot or the red triangle and say run dashboard.   And it'll take just a few seconds, but what you'll see is then finally a dashboard with all of those features together.   You can then go ahead and change the heights and the aspect ratios and different things like that.   You have to manually at that point, and your local data filters and when that is ready to go ahead and be put into a script,   all you have to do is go to this hotspot, Save Script and I'm going to just save it to a script window. And then you get a very long   and detailed script that then you can simply cut and paste into your existing scripts. So, that's really all I did is I would create a dashboard.   I would say the save the script to the script window. And then what I would do is is select all copy it and then paste it into an existing   an existing script that I was appending. So that's how you go create a dashboard. Let me now spend a minute and show you how I created a JMP dashboard with a multi-tabbed dashboard. So, I'm going to go ahead and close all this out.   And so, give me just a second to do all that.   The system tray is not visible right now, so I can't close everything out   all at one time, but let me go ahead and do all this manually here.   Okay, so i'm going to open up my dashboard now.   And while that is loading, I want to make sure that as I'm communicating this, that part of that customer orientation that I think is very valuable   is to actually go through, and make it very easy for your customer to create the dashboard itself. So, what you'll see here in my dashboard is, I have one script   called Create CSA dashboard and I literally tell my customers.   Open this open this JMP file all you have to do is press the little green triangle. Don't worry about anything else. It will all be created automatically for you. So, that's something that can make it very easy. Don't make it complicated for your customer.   This takes about two minutes to create so what I thought would be valuable is actually go into my script that I created and kind of walk you through that. At least show you some things that I found to be very helpful.   This script is fairly long but it's broken into basically six parts. I have one part each for my five different dashboards.   And then my sixth part of this is then the base of the code that I shared with you from JMP support and it wraps it all together into one dashboard.   So what I chose to do here is, again, I have four reports with each dashboard and I need to go off and create each of those reports. So, what I did here is I named each report.   This is the display dashboard that i'm giving you an example for, and so I named it Display and it was a tabulate function.   And then you'll see my code that created the JMP table. And again, all I did here was I went ahead and created the table. I saved the script to the script window. And I copied it in here. That's all I did.   One other thing that I did as well that has been helpful for me is to actually save this final report to my data table.   And I really did this for two reasons. The first is that I still get   one off sort of questions that I need to manipulate a specific report and then send them back that custom information. So,   by creating or saving the script to the data table this allows me then just quickly go to that script press the button very quickly get my information manipulate it and send it back.   The other reason why I do this is actually for my customer. I occasionally get some questions about hey Kevin can you change the title on this on this display or this report can you change the color of this? And, if they're.   you know advanced enough within JMP to know how to do that, all they have to do is go ahead and manipulate that themselves. They get the information without even contacting me. So, it's sort of teaching that person how to fish.   They can go ahead and change those things by themselves. So I've tried to enable them for that as well.   So this is the report. Then what I do is I build my second report and do the same thing.   I build my third reports build the do the same thing here.   Build the fourth report. Do that so now I have all four reports put together.   And then, much like as I showed you previously with creating the dashboard, I literally just copied and pasted all of that script information into this script here to create the dashboard.   And then, what I did is I put a name to it to make sure that I knew all the names were different on each of the dashboards.   So, and then at the bottom after i've done that five times, I will show you down at the bottom, down here,   create the tab dashboard, and this is literally the the code that I showed you in the presentation and the code that will upload to the website as well. So, you'll have all this information here.   Now invariably you are going to need to change something. You're going to forget a title of something your want to change a color.   There's two ways to go and do that. One is you have to recreate the report itself the way you want it.   And then you could manually go in and recreate the entire dashboard, because the coding here actually has that report embedded in it. So, you can manually create it. Or what i've chosen to do that's been more   efficient for me is to actually go in create the new report and then change it into areas inside the script and then you're good to go. That way you keep all of your formatting your window sizing and all that the same as you'd like it.   So, say, for example, I'll go back to this table function – say I wanted to go off and create a different table or modify that table. All I would do is regenerate that report.   I would copy it into this area again to make sure the report gets regenerated. But then what I also have to do is take the same code   and then replace it in the JMP dashboarding scripts. So, I'll show you there's the little trick with, that is, you have to go out and look for, Ctrl-F, and you have to look for a   code called platform open paren. So, if you kind of see the table data here what I will do is I will look for the platform open paren.   And then you will see here inside the display dashboard the same code is in here. And so, you simply have to replace it there as well. So,   just remember if you're modifying something that's already existing you want to change it up in your report section, then you'll also want to change it in your in your dashboard section. Okay, so I now have my combined multi-tabbed dashboard.   You can see that it is a segregated or I have a filter on the segments. Again, I can pick mainstream, or entry, or high performance.   I have all my filters that are very important to me and my organization and then again there was something that's the real value with this   is the different tabs that are here. I have a tab on Display and, in this case, I have a table in here.   I have information on RAM so I get a treemap. That's a different way to display this. There's a lot of different variations of things you can put in here, and this is just some of my examples.   I can look at graphics information. There is CPU information and then something that I like within the platform is i'm real fan of the parallel plot.   That allows you to look at the variations, or at least the relationships between different X variables and then you'll see some box plots in here.   So, this is an example of a dashboard again there, there are a million rows or more than million rows of data behind this and I I think that's a fairly robust platform. It can really be manipulated very easily.   One thing I wanted to do in here for folks who want to pursue this is the Decision Tree platform. I would love to be able to create decision trees here that actually can manipulate through the data filters.   That is something that is now being supported in JMP 16. I don't have JMP 16 yet, so, I can't try it out. But anyway that platform is now added.   I think that would be incredibly valuable some sort of decision tree platform to come in here and actually have it affected by your local data filters.   So that is the, that is, the dashboard and so really and trying to end here – again just a reminder – that this is something that I really hope can be very powerful,   be a very effective and efficient way to communicate in your organization. And, I hope that this presentation and the associated JMP script can really help you and further your career. So, I thank you very much for your time.   And that is it.