Choose Language Hide Translation Bar

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Wednesday, December 15, 2021
レベル:中級 実験計画法を活用して工程改善の成果を挙げるには、まず対象工程内の誤差を低減し、次に目的に合った実験を計画、実行し、得られたデータを解析して最適解を求める。これを工程に適用して改善を確認し、もし改善が不十分であれば最適解を修正して確実に成果を挙げる。以上の取り組みは単なる実験計画法ではなく、一連の取組みを総合的にまとめた包括的実験計画法である。この習得するには、仮想的であっても計画、実行、解析、改善のPDCAを実践的に体験することが重要であり、従来は紙ヘリコプターやテーブルゴルフなどの教材が用いられた。しかしこれらには一人で短時間に効率よく学べない欠点があった。   近年JMPのアドインソフトとして、誤差を含む工程データを柔軟に生成できる飛球シミュレーターが開発され、一人でも効率的に包括的実験計画法を習得できるようになった。そこで本研究では飛球シミュレーターとJMPの実験計画法プラットフォームとを連携させ、カスタム計画(最適計画)を前提に、包括的実験計画法のPDCAを体験学習する教育プログラムを考案した。特に非線形応答にも対応して最適解を回帰修正するために、多水準の繰返し実験を採用した点に特徴がある。
レベル:初級 本研究は,脳神経クリニックにて医療の質を確保する目的で実施したアンケートから,初診患者の満足度を俯瞰的に把握するために分析したものである.   本研究の前段階で,初診患者満足度を効果的でかつ効率的な維持・改善施策を立案する際に,重点指向に基づく選抜型両側因果分析が応用可能であることを明らかにした.それをもとに,今後も当クリニックでは患者満足度を維持・向上するために,患者満足度を俯瞰的に把握したいと考えた.そのために,これまでに収集したデータを用いて選抜型両側因果分析では採用されなかった影響度の弱い原因系項目も復元させ,SEM(構造方程式モデリング)に準じた分析を行う.   選抜型両側因果分析で選抜された主要原因系項目とその背後にある因子は,SEMの分析でもある程度の適合度が得られる.しかし,影響度が低い原因系項目とその因子を加えることによって,因果構造をより俯瞰的に把握できるが,その一方で適合度指標は相対的に劣化する.   今後,本研究の結果を用いて本来のSEMに近づける過程において,新たに患者満足度調査を実施することで適合度指標の改善を考えている.本発表では,一連の分析過程で得られた知見を紹介する.
Bradley Jones, JMP Distinguished Research Fellow, JMP   As is evident by its name, the original intended use of the Fit Definitive Screening platform was to analyze Definitive Screening Designs (DSDs). The surprise is that this platform can analyze a much broader class of designs than just DSDs. It turns out DSDs are a very special kind of Foldover Design, which are a standard textbook design used for factor screening. All that's needed for Fit Definitive Screening platform to do its innovative analysis is a Foldover Design. This talk demonstrates how to make Foldover Designs using the Custom Design tool and then analyze them using the Fit Definitive Screening platform. Several examples  illustrate this two-step procedure and the analytical results are compared with more standard approaches that ignore the structure of the design.     Hello. My name is Bradley Jones. I'm the manager of the JMP DOE Group, and what I want to talk to you about today is a surprising use of the Fit Definitive Screening Platform. If you haven't done a Definitive Screening D esign and analyzed it using this platform, then you wouldn't know exactly where the platform was. I'll show you that. But what I'm going to show you is that you don't have to use the definitive screening fitting platform just to fit definitive screenings. It can fit other stuff as well, and that's a surprise. I'll start out by talking about what the main idea of this presentation is, and then I'll review how Fit Definitive Screening works. Of course, I'm going to show you by hand, but the platform does all the work for you, so you don't really have to ever do all this tedious stuff. And then I'll have a couple of examples of using Fit Definitive Screening to analyze other designs that are not definitive screening designs. And I'll make some recommendations at the end. To start out, here's a definitive screening design. And if you look at the first pair of runs at the top, you can see that each value here is plus or minus one, and each value here is minus or plus one. W hat it's trying to show there is that whatever value the top number has, the bottom number has the opposite value. If this is plus one, then that will be minus one. If this is minus one, then this will be plus one. And the fact that all six pairs of designs for this example are mirror images like this means that the Definitive Screening Design is a Fold over design. Let's think about what Fold over designs are and what it means to have a Fold over design in terms of properties. For any Fold over design, the main effects and two- factor interactions are uncorrelated, and that means that they're statistically independent. Orthogonal Fold over designs exist for every multiple of eight runs. However, orthogonal main effects are not as important as the orthogonality of main effects in two- factor interactions. You may choose to allow for some non- orthogonality in main effects in order to get this nice property that main effects are not correlated with two- factor interactions, which means that if you have active two- factor interactions, they won't bias the estimates of any main effects. I want to show you how to make, or talk about how to make a Fold over design in JMP, and you can do this in the Custom Designer. You open the Custom Designer. You add two- level Categoricals or Continuous factors. By default, Continuous factors are always two levels. And then you choose a model that only has main effects, which is Default, again. And then you can choose a number of runs that's a multiple of two where the multiple of two has to be at least as big as the number of factors times two. Then in the red triangle menu on the Custom Designer, you go to Optimality Criterion, and look at the sub menu and choose Make Alias Optimal Design from the red triangle menu. Then you're done. All you have to do is click Make Design. And then after you see the design, you can check that the Alias Matrix contains only zeros for the main effects and two- factor interactions. If the number of runs in your design is not a multiple of eight, then you may see some correlations between two- factor interactions and the intercept. But that intercept estimate isn't really important for screening. Let me give you a JMP D emo of that process. The first example here is how to make a Fold over. And I'm going to create a six- factor custom design. And in this design, I have factors A through F. And you can see that the A through F factors are the only things in the model that's the Default. Now I'm going to the red triangle menu and I choose Alias Optimal, which is the last choice here. And then if I say, "Well, let's do 16 runs instead of 12," and Make Design, it's going off and it's going to compute the Alias Optimal design for six factors and 16 runs. Now I'm going to check by looking at the Alias matrix, and you can see that everything in the Alias matrix is zero. Another thing I can do is look at the color map and the correlations, and I can see that the main effects are all white, and the rectangular area showing main effects and two- factor interactions is also all white, which means that this design is, in fact, a Foldover design. And I could use it to do a screening experiment. Let me go back to my slides. All that the Definitive Screen Design platform does is check that the design is a Foldover. It doesn't actually require that what you have in the table that is current is actually a DSD. You can use Fit DSD to analyze any Fold over design. And that's the surprise. And that's the main idea of this is to show you not first, that you can create Fold over designs very simply using the Custom Designer, and that if you have a Fold over design, you can use Fit Definitive Screen Design to analyze the data. It turns out that since main effects and two- factor interactions are orthogonal to each other in a Foldover design, you can split the response that you observed into two new responses. One response you use for identifying main effects, and you could call it YME. And the other response you can use to identify two- factor interactions, and you could call that Y2FI . And it turns out, because the main effects are orthogonal to the two- factor interactions, the two columns that you create doing this will be orthogonal to each other. The way you would do that is you can fit the main effects model with no intercept and save the predicted values of that model. And then after doing that, and you can call that column YME, and then the next thing you do is save the residuals from that fit, and those residuals are in the space of the two- factor interactions. Now, these two actions are unnecessary, but if you wanted to know what's behind the screen, then this allows you to do all these actions for yourself if you want. But, of course, you can use the Fit Definitive Screening platform, and it's doing this behind the scenes. Let me make a small digression. I think it's valuable to use the Model Heredity Assumption, which is that, generally speaking, two- factor interactions are much more probable to occur if both the main effects that compose that interactions are active themselves. For example, if factor A and factor B main effects are both active, then you might want to consider fitting the two- factor interaction of A B. Now this is not a physical law. Nothing makes it absolutely necessary that this hold. And yet empirical evidence has shown that such models are much more likely than having interactions be active when the main effects are not active. Now, everybody who's done a lot of experiments has counter examples to this. All I'm saying is those counter examples are comparatively rare. Now why would you make this assumption? Well, here's the reason. If you use the Heredity Assumption, you can have the set of possible models being much smaller than if you don't make this assumption. In the example that I showed you earlier where I had factors A through F, let's suppose that it turned out that only factors C, D, and F were active, then you would only consider the three two- factor interactions, CD, CF, and DF. And since there are three of these interactions, there are two to the third possible models, one of which has no interactions, and you have three with one interaction, three with two interactions, and one with all three interactions. However, if we wanted to look at all the two- factor interactions among all the factors A through F, then there are six choose two or 15 possible two- factor interactions, which means that there are two to the 15th or more than 32,000 possible models. And sifting through all of those models is a much harder model selection problem. If you can rely on the Heredity Assumption, you can save yourself a lot of work and also a lot of ambiguity in making your model selections. Now going back to how you do this, you can form the two- factor interactions involving the active main effects and then do step wise regression up to the point where the mean squared error of the model that you have is relatively small. If you have an estimate of a Sigma squared, if they're roughly comparable, then that is the time to stop. But, of course, this still is not necessary t o do by hand because Fit Definitive Screening is going to do it for you. Let me show you a couple of examples of this process and I'll start out with an example from Doug Montgomery's Design and Analysis of Experiments textbook, the eighth edition. And he did this as first running a resolution three fractional factorial with seven factors and eight runs. Let me show you that design and how that is analyzed. Here's a resolution three design, and if I do just Fit Screening, what you see here is that our B, D, and A are the active effects and maybe G is marginally active but small compared to the effects of B, D, and A. But let's evaluate this design. I just click the Evaluate Design script in the table. And if I look at the Alias matrix, you can see that factor A is confounded with the BD interaction. You could also learn the same thing by looking at the color map. And you can see that the correlation between A and the BD interaction is one, which means that I don't know whether what I'm seeing is actually the main effect of A or the two- factor interaction of B and D or any linear combination of those. What I have is an ambiguity and I need to make some more runs in order to resolve that ambiguity. Going back to my example, what then happens in the textbook is that the design is folded over. Now instead of eight runs, there are 16 runs. And let me show you that example. Here's the folded over design. If I do Evaluate Design and then look at the Alias matrix, I can see that the Alias matrix is identically zero for every possible value. And I can learn the same thing by going to the color map on correlations. And I see that the main effects are orthogonal and all the main effects are orthogonal to all the two- factor interactions. I now know that I have a Fold over design and that my main effects are not going to be biased by two- factor interactions. I have data for the time it takes for the eye to focus. And if I click on this script, I see the result of having done the Fold over. And what I see is first that B and D are the two main effects, that's as before, except that A is no longer there because, guess what? The BD interaction is massively significant. And the true model is BD and the BD interaction. And now we can run this model and we can see that first our actual by- prediction plot and our residuals all look good. And then playing with the profiler, we can see that as I move B from one end to the other, the slope of the line, the prediction line of the effect of D on time changes. And that's the nature of interactions. When you have an interaction, the slope of one factor depends on the value of the other factor. And so now we have this is the setting that you would use if you wanted to maximize the time that takes the eyes to focus. Generally, you would want to minimize the time. This would be the setting that you wanted to use to minimize the eye focus time. That's the end of that example. Let me now go back to my slides for just one second and introduce the Peanut Solids example. The Peanut Solids example was an example that my friend Chris Nachtsheim actually did in a consulting environment and we have in the sample data library and the Peanut Solids experiment as a definitive screening design. But what I did was instead I created a two- level Foldover design and used the same model to create data for that. And so let me show you that data, which is my second example here, the peanut example. And notice that I have PH, water temperature, extraction time, ratio, agitation speed, and two categorical factors, whether you hydrolyze the peanuts first and whether you presoak the peanuts, and then what are being measured are the peanut solids. Notice also that the number of runs here is 22, which is not a multiple of eight. And therefore, when I look at Evaluate D esign, I don't expect that this design will be orthogonal for the main effects. And when I look at the Alias matrix, you can see that the intercept is correlated or aliased by a very small amount by any active two- factor interaction. And again, if I look at the correlation color map cell plot, you can see that the main effects are not orthogonal to each other, but their correlations, their absolute correlations are very small like one over eleven. And again, because this area is white, main effects are orthogonal to two- factor interactions. And this is what we wanted. This is what we have to have in order to use the Fit Definitive Screening platform. Now I'm going to run that platform. And what I see are that I have four active main effects and that means that I have as many as four choose two or six two- factor interactions to check. And it looks like ratio and agitation speed is a term that I don't really need, very small estimate of the example. But the true model that generated the data involves these four two- factor interactions here. Ratio times agitation speed turns out to be a type one error, except we would probably get rid of it anyway, and its estimate is very small. Here we've used the Fit Definitive Screening Design and also this assumption of heredity of the main effects and two- factor interactions to find not only all the main effects but four active two- factor interactions among the two- factor interactions that are related to the active main effects. We found the actual data- generating model, the correct model. Okay, so what do I recommend that you do? First, I've shown you how to use the optimal design criterion in the Custom Designer to create Fold over designs and you can do that relatively simply and you don't necessarily have to create orthogonal Foldove r designs. The Fit Definitive Screening Design platform doesn't care whether the design is orthogonal or not, it will still analyze the data as long as the design is a Fold over design. And then once you have this Fold over design and the data, you can use the Fit Definitive Screening Design platform to analyze the data. And so, in the words of Nike, "Just do it." Here are some references. The first two are referencing the original paper on Definitive Screening experiments and then the paper by Xiao and co- authors shows how we create Definitive Screening Designs nowadays without involving optimizations, just by direct construction using conference matrices. Then the second paper, Miller and Sitter, have the basic idea that I've introduced here to analyze F old over designs. This was in Technometrics way back in 2005, but we're using slightly more current model selection techniques than Miller and Sitter. And finally, the last reference there is telling how, again in Technometrics, Chris Nachtsheim and I wrote a paper to basically explain how to use this Fold over technique and the analysis to do model selection for Definitive Screening D esigns in a two- step method that I talked about at the very beginning of this talk. Thank you very much for your attention and I'll be at the talk when it's finally delivered to answer questions.
Textual analysis of written documents has become an important analytics tool in accounting and finance decision making. Several research papers have expanded the textual analysis and have also measured the written tone in financial documents converting the tone a quantitative score of optimism/pessimism. Some of the research has connected the tone to explain abnormal returns in the financial markets. We explore this connection with the use of the Sentiment Analysis platform in JMP Pro 16.        Hello everyone, my name is N ilofar Varzgani. Today, I'm going to be presenting my research study that I have conducted using JMP Pro 16, and specifically within JMP Pro 16, I used the Textual Explorer platform as well as the Sentiment Analysis functionality within the Text Explorer platform. The title of this study is Textual Analysis of Earnings Conference Calls: Differences Between Firms. This study that I'm working on is with my two co- authors, Dr. U gras and Dr. Ayaydin. They're not going to be presenting with me here today, but they will surely attend the presentation itself. I'll start off with a little bit of an introduction. Textual analysis of written documents has become an important analytics tool in accounting and finance decision- making. Several research papers have expanded the textual analysis and have also measured the written tone in financial documents, converting the tone into a quantitative score which measures optimism or pessimism in the tone of the speaker. Some researchers have also connected this tone to explain abnormal returns in the financial markets. In my presentation today, I'm going to talk about how we try to explore this connection with the use of the Sentiment Analysis platform within JMP Pro 16. Let's get started here. Sorry. All right. Let's talk a little bit about the motivation behind this study. For many years, capital market studies have researched whether quantitative analytic information reported by firms such as earnings and revenue, or other accounting measures influence decision making. Recent studies have shown that in addition to these type of quantitative information, qualitative analytics from the firms and from the media influences i nvestor behavior. This qualitative information includes texts in 10- K reports, earnings press releases, conference call transcripts, comment letters to the SEC, analysts remarks, articles in media, and conversations in social media. Several studies have also shown the importance of earnings conference calls that immediately follow the quarterly earnings releases of public companies. In our study here, we're examining whether the impact of the earnings conference call tones varies across different groups of companies. Let's do a little bit of background on the literature that already covers textual analysis so far. Textual analysis has been used to analyze a variety of documents through alternative approaches, and one could categorize these approaches into three broad categories. The first one is the use of the Fog Index, which is basically a function of two variables: the average sentence length in number of words, and the word complexity, which basically measures the percentage of words with more than two syllables in it. The second category of techniques is the length of reports, although this one seems like a rather simplistic approach, but it has been useful because of the simplistic nature. There have been a couple of studies which have used the length of report to proxy as the complexity of the report itself. The third approach is the use of a word list. Now, in terms of the word list, there are a number of word lists that people have created themselves, such as the Henry word list or the Loughran and McDonald's word list. But in our study here, we utilized the inbuilt dictionary that JMP Pro comes with, and in addition to that inbuilt dictionary, we augmented that with some phrases, which we added as terms, as well as a custom list of stop words based on the sample data that we were working with. Let's talk a little bit about the data itself. Our data sample size is approximately 25,000 observations, which means that we had close to 25,000 earnings call transcripts that we analyzed, and the date range for those transcripts is from 2007 Quarter 1 to 2020 Quarter 4. We have tried to incorporate only the text portion of the earnings transcript call, removing any graphics or any special characters that might be a part of the earnings call transcript. All of these transcripts were downloaded from the LexisNexis database in RTF format. Just to give you a little bit of an intro as to what the earnings call transcript basically looks like. It starts off with the title, which mentions the name of the company for which the earnings call announcement is being made. It has the word Fair Disclosure Wire in the next line, followed by the date. Then the main call itself is divided into two sections. You have the presentation section, which are the prepared remarks of the managerial team, which is attending that call, basically, and then you also have a discussion between the analysts who are sitting in on the call live, and they ask questions to the managers, and the managers respond to those questions. So the prepared remarks and the QnA portion. Now, for this study specifically, we've only looked at the prepared remarks portion of the earnings transcripts, but later on, as an extension of this study, we're planning to also incorporate the QnA portion of the earnings call as well. In addition to those two blocks of text that are a part of the transcript, most of these transcripts also include a list of all the participants who are on the call, which includes all of the managers from the company side, as well as all of the analysts from different institutional investor sites. Let's talk about the methodology a little bit first. We extracted the transcript and the prepared remarks of the managers section was titled as the DocB ody. The Qn A was titled at the discussion, and we counted the number of analysts who attended each call. Now, keep in mind that because of the fact that not all calls have a Qn A segment, the Qn A part might be missing for some of the rows in our sample. Which is why for this conference and this study, we've only focused on the DocB ody, which is the prepared remarks portion of the earnings call. Then we also created columns which could be used as identifiers. A ticker column for further analysis, the year quarter, as well as a calculated column which measures the length in terms of the number of words in the prepared remarks section. Now, the distribution of length is interesting and we're going to show that output in a little bit. Before we moved on to the T ext Explorer platform, we changed the data type for the document body to unstructured text- sorry, the data type character and the modeling type to unstructured text so that the Textual Explorer platform can work. Before I show you the Textual Explorer platform, I just want to talk a little bit about the terminology that is going to be used a lot during the Text Explorer platform and the output shown. In textual analytics, a term or a token is the smallest piece of text similar to a word in a sentence. You can define terms in many ways, although the use of regular expressions or the process of breaking down the text into terms is called tokenization. Another important term that is going to pop up a lot when we see the output of the textual platform is a phrase which is a short collection of term. The platform has options to manage phrases that can be specified as terms in and of themselves. For example, for our earnings call study, a phrase which popped up a lot was something like effective tax rate. Although effective, tax, and rate, are three separate terms, but effective tax rate being used together most of the time we converted that phrase into a term itself so that we can analyze how many times that particular phrase as a whole is being used in these conference calls. Next is the document. A document basically refers to a collection of words. In a JMP data table, the unstructured text in each row of the text column corresponds to a document. Then we also have a corpus, which is basically a collection of all of the documents. Now, another important term which we're going to use later on during the output is the stop words. Stop words are basically common words which you would want to exclude from analysis. JMP does come with its own list of stop words, but there might be specific stop words in the data sample that you are using which would apply to that data set only. For us, we created a custom list of stop words which you can easily view in the Text Explorer platform. You can have a list of stop words in an excel format or in a txt file and then upload that within the Text Explorer platform and use those as stop words. Then finally is the process of stemming, which is basically combining words with identical beginnings or stems, so to say, to basically make sure that the result compiles those similar rooted words as one word. For example, jump, jumped, jumping, all would be treated as one single word instead of three separate words. Now, for our study over here, we decided to go with the no stemming option because there were some issues with the stemming that we noticed. For example, some words like ration could be used as a stem for words like acceleration, which has nothing to do with that word itself. So we decided to go with the no stemming option in our case. This slide over here, we look at the options that we selected for the Text Explorer platform. The main variable of interest is the DocB ody. We use the ID to ID each row of observation, and then we change the default options for each of these features. The maximum words per phrase, the maximum number of phrases, characters per word, maximum characters per word, to the ones that we thought would be suitable for our particular data set. A s you can see, that we increased the maximum ranges to a lot higher than what the defaults are just to be on the safe side. that we're not missing out any important terms within our analysis. Our initial output that pops up once you run the Text Explorer platform gives you a list of all of the terms which were highly used within the sample of data, as well as the phrases. We reviewed the list of phrases and selected the phrases that could be used as terms. There were a total of 30,000 phrases out of which 1,068 phrases were added to the term list. In addition to that, we also created our custom list of stop words. W e found it easier to basically import all the terms from our sample from JMP into Excel, sort those words, and then basically remove everything or count all of those words as stop words which have certain characteristics, for example, they had symbols or commas or dollar signs in them, or numbers which were being treated as text or common names, for example, John, Michael, David, et c. W e added all of those to our stop word list and we uploaded that into the Textual Explorer platform. Let's look at some of our output. The first analysis that we did was on the variable length of the prepared remarks. T he assumption over here is if the prepared remarks section of the management is longer, it basically shows that they have more to explain to the investors, and that is why the complexity or the tone of those reports might be different for the other reports, which are shorter in length where the managers don't have to do a lot of explaining. A s you can see with the distribution output on the left, the length of our sample data set over here is slightly asymmetric with tiny tail towards the right hand side, which means that there were some reports which were longer than the others. The mean of length is around 3,027 words and the median is around 2,9 66 words. You can see the mean and the median are not too far away from each other, which probably means that we can assume it to be symmetric as opposed to asymmetric like the histogram over here denotes. The difference between the mean and the median is not too much. We did look at the median length of the reports versus over the years, and a s you can see in 2007, the earnings calls were much longer than the years after that, and t hen we did see a slight bump in 2020 as well. If you look at quarter wise length of the reports, you'll also notice that Q4 generally has the longest reports because the management is explaining the functions and the operations of the company for the whole year and they're compiling results from the previous three quarters as well. In terms of the tickers which had the longest average length, we had Boston Properties as reporting the longest average length, at the average approximately close to 6,000 words, which is double the average length of the whole data set in general. Next we have is basically the word cloud. Now, just to compare stemming option versus non stemming option, on the screen, you see both the word cloud with the stemming option, which is on the right hand side, and the one without the stemming option. We preferred the no stemming option because it lets us see the words which show up in most of these earnings calls more often than the other words, whereas the stemming option might end up with a word cloud which is not very explanatory. As you can see, growth, new, revenue, increase. These are the words which pop up the most, which basically signal that managers are mostly optimistic and positive in their tones in their prepared remarks section of their reports. Then I also have a screenshot of the Sentiment Analysis platform which basically again tells you the distribution of the overall tone, positive or negative. A s you can see from this histogram over here, the overall tone of these prepared remarks is mostly very positive, with only very few earnings call transcript which fall in the negative sentiment portion of this distribution, which again signals that managers tend to be more positive and more optimistic when talking about the operations of the company so that they can signal the fact that the future is going to be bright and it's going to be better, and that definitely affects how the investors react to this tone. Next, we also decided to look at the overall sentiment of the calls, as well as the positive mean and the negative mean of the sentiments. As you can see the positive sentiment, mostly it's around a value of 60, whereas the negative sentiment, we see a bump around the negative sentiment of minus 40 so none of these earnings calls were too negative, even if the company performance was really bad for that particular quarter. B ecause they want to signal a brighter future, and not focus too much on the history. If you look at the overall sentiment versus the years, you'll notice that the overall sentiment was much lower during the financial crisis of 2007- 2008, and then it bumped hugely in 2009- 2010 and overall, it has been relatively steady except for 2020 when the pandemic hit. If you break it down quarter wise, you can see that the bottom center graph over here shows that some quarters, specifically the fourth quarter, might see a drop in the tone of the overall sentiment in the whole data. If you look at the length versus the year, again, you'll notice that the length was much higher in 2007, it dropped in 2008, again picked up a peak in 2009, and overall, the length has reduced over time until 2020 itself. It might be a safe assumption to make that when times are tough and the companies have more to explain, the earnings call tend to become longer and the prepared remarks are longer. However, if you look at length versus the overall sentiment, you'll notice that there seems to be a slight positive relationship between length and overall sentiment, but it's definitely not a simple relationship like a linear upward trend. Instead, the data is quite heteroscedastic. Here I have a list of companies which showed up with the highest overall sentiment over the years versus the companies which showed up with the lowest overall sentiment over the years. I also put the industries in which they belong just as an interesting piece of information that we noticed. For example, a lot of the positive sentiment calls were the ones from the technology services area or financial services, whereas the lowest sentiment was in the waste management or medical technology industries. In terms of the future research that we plan to do on this topic, we want to examine the tone of these earnings call, and do a cross analysis with variables like managerial strategic incentives with disclosures, the impact the tone has on analysts and investors, as well as some variables which are specific to the firm, such as the size, their complexity, their age, et c. W e also plan to explore the term selection for building data mining models using the Text Explorer platform within JMP Pro. Thank you so much for attending this presentation and hopefully we can answer any questions that you may have about our presentation today. Thank you .
Åke Öhrlund, Galderma, Galderma   Data from a tensile tester contained 92 runs, each with four columns and 3,800 rows. The sample name was between the header and data, in one of the four columns. At first, the data was imported and stacked, which omitted the sample name. Next, the sample name was imported and stacked. Finally, the sample name was joined to the data table, allowing visualization and analysis. Instead of preparing the data in Excel, it was imported to JMP and formatted, saving hours of work and preventing possible errors.     Hi, my name is Åke Öhrlund . I work at Galderma in Uppsala in Sweden. I've been using JMP for many, many years, almost my whole working life so far, but apparently I'm still learning. That's what I want to share today. One of the things I've come to realize more late is I've spent too much time working on data, preparing it for getting into JMP in Excel; c utting and pasting. I want to show you an example how you can do that much faster in JMP. I got this data set from a colleague of mine. It was a Tensile testing data. It was 92 runs, each one 3900 rows of data. It was layout like this with four columns for each run, so 444, 92 times. On row number three there was a sample name crammed in and then the third 900 rows of data. I couldn't pick it directly into JMP, so she said, "You want me to cut and paste it to be in same columns like you used to in JMP?" I said, "N o. I'm going to try to do it directly JMP." This is what I did. I started by importing the data leaving out row number three. I just have to tell JMP that is two header rows and it should skip row number three and start on row number four. Do that and I have all the data with the two rows of header on top. But now this is a white table so I want to stack this. S elect all the columns, put them into stack. Now I have to tell JMP there's four of these columns that go together, and JMP actually seems to get four by four the way they should be. I just click okay, and then I have a table containing all the data, a lot of data rows stacked on top of each other. I only have four columns actually containing the data and four columns called label that described what is in those data columns. I could be working from here, but rather also have the sample name there. Do the import again; st art all over. This time I want row number three still including both header rows and include row number three, so data states starts here. Click next, and then I have to tell you that you could skip everything after row three. There you go. You have one row of data containing the sample names. Still this is wide so I want it in the long version like I have with the data table, so do stack again. Select all the columns in to stack. This time I don't have to tell JMP anything, just stack what you have there. There you have it. Another data table with all the sample names here. I named this one Sample. This one I can match with the original data table. That's what I'll do. I go to the data table, it's named Untitled Three. Then I go to update, and in update I say, "Update with data from Untitled Six." What should I take from there? Selected the columns, sample, replace nothing in the old data table and now I match the label here with label here. Click okay and I end up with a data table with all the data and all the sample names. Again, I could work from here but since this was quite fast I want to tidy up a bit. I want these labels to go up here, this label to go up there and so on. J ust copy these and go down here and paste them into the data two, data three, data four. They end up here. After doing that, I don't need the label columns anymore. I can just remove those. Now I have all the data in their columns headings with the right column headings and the sample name here. Now, of course, this is where the actual fun begins. It's very easy now to pick out certain samples and do what not. That's the fun part of JMP, of course. What I conveyor here is try not wasting too much time in Excel cutting and pasting because chances are you might as well do it in JMP in much, much less time. I've been using JMP for many, many years and still trying to realize all the stuff you can do so I'm going to try this a lot more. That's all from me today. Thank you very much for listening.
Structural equation modelling (SEM) is a method of model construction that displays the variance and covariance in and between latent and manifest variables visually to define a system. Science and engineering models are typically constructed using various relationships, but these relationships are usually given in the form of equations, which can rapidly become very complex. SEM provides a visual method to determine how each of the measured variables are affected by the underlying cause of these observable responses. Usually, SEM is applied to the fields of psychology and sociology where relationships between the underlying cause of measurable variables and the measurable variables themselves cannot be directly ascertained. However, there is scope for this method to be applied to science and engineering problems to help understand the underlying causes of responses within a system. In this poster, this concept is demonstrated in one of the simplest models of SEM, linear regression, which forms the basis of many models within engineering. This basic building block of SEM can then be extrapolated to systems with many variables to determine the overall effect on the system by allowing any variable to be treated as both a cause and effect of other variables.     Hi, I'm Jordan Walters, I'm a Technical Intern at JMP and I'm going to present to you today about Structural Equation Modeling and how we can use this in linear aggression to actually find relationships with otherwise might not already be obvious. So to do this, we're going to have a look at a case study about manufacturing of pharmaceutical tablets. Within this case study, there's lots of factors and responses that need monitoring and we need to try and determine the relationship between these factors and these responses to try and work out how the system works. And in doing this, this is going to allow us to find the optimal conditions for our tablet to make the best tablet possible. But to make this entire case study a lot simpler, rather than focusing on all of the responses and all of the factors, we're just going to focus on one response and one factor. So in this case, the one factor we're looking at is the percent composition of water and the one response that we're going to look at is the density of the tablet. And that's going to be the metric of whether or not we have the tablet or not. So just for reference, the tablet density is in milligrams per centimeter cubed. So typically, how we define the relationship between this response and factor is just through a simple linear regression which from the name, gives us a linear relationship between the response and factor. And this is what we need to fully understand the system. But the problem with linear regression is it's only going to allow us to find the direct effects between changing the water composition and the density of the tablet. It's not going to let us find anything else about the system or any underlying features. And so because of this, it might actually fail to give an impression for the system. And this is where Structural E quation M odeling can actually come in to try and give us a more comprehensive view of the system which is going to allow us to find different relationships which maybe more support and not need for obvious . So a bit of background on linear regression, in its simplest form, linear regression is just a linear relationship between one variable to another. And since linear regression is by definition, linear, this is going to be connected by a linear equation. In this equation in this case, is Y equals mX plus C. Well, Y is the Y response, X is the X factor, m is our radiant, and C is our offset. So the m and the C are just constants that we're going to find to fill this regression equation so that we can relate on Y on X variables. So to do this in the context of the case study, we're going to plot the X and Y variables to try and come up with this regression equation in JMP. And JMP can very simply do this for us. It gives us a lot of information, some of which we don't need for performing this linear regression. But the two pieces of really important information that we get is the regression coefficients which is our m and our C, which are about here, and the regression equation which is the full form of our linear regression. Now JMP does give this to us in the form of Y equals mX plus C but for the purposes of clarity, and to see how this works in our case study is presented here as a proper relationship being that the density is equal to 0.11 7 percent of water plus 95. 903. And this is reported above with your m and C values. Well, you might simply see that all this relationship means is that for every one unit increase in percentage to L, we get a one unit increase in density. But we have that C value at the end which offsets our density by 95 .93. The purposes of the C value is to give a scale to our data. Without that scale, what we have is a relationship between density and percent of composition of H_2O. Adding this constant at the end gives us that scale to actually be the units that we need to study within this case study. So now that we've got an idea of how we traditionally go about looking this problem, let's try and start building the Structural Equation Model. So to transform traditional linear regression into the form of a Structural Equation Model, we need to do a few things. We begin by moving this relationship away from its graphical form, into this visual form. Well, we can see that we've got our X factor in the rectangle being linked to the other rectangle which is our response. And between these two rectangles, they are linked by this arrow. Now, a single- headed arrow just means a relationship going one way. And then Structural Equation Modeling, it is possible to have double- headed arrow but that's not particularly important in this example. What the single- headed arrow means is that H_2O composition affects the density of the tablet but the density of the tablet does not affect the H_2O composition. And this allows us in this instance, fill this individual X and a Y perform linear regression. Now the confusing part about this graph is probably the one in the triangle at the top. And simply, all this does, and this is usually hidden within a structural equation model, is it allows us to set this scale for our entire model. And we can see that through how this Structural Equation M odel actually reports the data. So if you look across the axis between the one in the triangle and the Y response, the density tablet, you'll see that we actually get a number on that, which is 95. 903 which we've seen before. It's actually our C value from our linear regression. And this further proves that this is the part of the graph that is used in this given scale. So if we can see where the C comes from and that directly translates where does the m coming from. Well, the m is actually from a connection at the bottom there between the percentage to a composition to the density of the tablet. And we can see that it's not 0.117, the exact same as we found. Bear in mind here that both of these analysis performed in two different platforms. Now, we might have two parts of our Y, it's in the C equation there. But this schematic is in the form of three sides. And so you might be wondering what is that final side? And simply put, the left hand side, which is essentially the number one and %H2_O Composition, is just the X and the SEM. Similar to how on the C value, which gives us the offset of the Y goes into the C graph, is the Y- intercept of the graph which is typically important and quite useful statistically. We do also get this X value given to us by the Structural Equation Model, which isn't particularly important because we don't have a good look at the X-intercept but it's interesting that the analysis can provide us with this additional information without really asking for it. This is all computed automatically like I say. And then typically, there's top parts hidden, of course, just to give us context as to what's actually happening here. And so looking at what this we've actually got here, we've got these solid lines which denote that something is statistically significant. And between the H_2O composition and the density of the tablet, we have this dashed line which actually denotes that that is not a statistically significant correlation. And the fact the correlation is very low, with 0.117 implies it's not that well correlated anyway. So if we just took this at place value, you might be inclined to agree with our initial linear regression and say there's no effect from H_2O composition on density of the tablet. But since we've already got on data in the form of the Structural Equation Model, let's continue with the exploring to day and see if we can cover any hidden relationships. So we know that this entire case study is built up of lots of X factors, lots of Y responses, and we're sure that we simplify it down to just one factor and one response. So let's think about some of the other brackets within the system and how they might relate to this response. One example which occurs quite a lot within this case study is an example of a mediation variable. And what we mean by a mediation variable, is it's a variable which can't exactly be controlled directly, but has an impact on our final response while still being affected by one of our factors that we can control. So in this case, we're going to take a look at crushing strength as being one of our mediation variables. And the reason for this being a mediation variable is because the crushing strength can be changed in this process. And it does have an effect on the density of the tablet. But the crushing strength can only be [inaudible 00:08:50] certain operational range depending on the water composition within the tablet. If the tablet's too dry, we need to change the crushing strength to match that so it doesn't completely powder the tablet. If it's too moist, it's got too much water composition in there, then it might be a much softer crushing. So in this way, you see that this is a variable that we can't change directly because it needs to be affected by the water composition but it is important somewhat in the density of the tablet which we'll explore then. So if we add this into our path diagram in our Structural Equation Model, you can see that we get this new triangle between each of our variables with the number one this time now, in the middle. And what this is doing in the context of the case study, is it's saying that now we have our direct response which is the connection between the H_2O composition and the density of the tablet. And we also have our indirect response, which comes from the connection which goes all the way around the diagram through cr ushing strength and into the density tablet. Now, what this means is this is actually a two- part equation. So not only do we have an extra C as before when there was one X variable, we now also have another variable which is in a middle part of the equation. So because of that, the first thing that we notice is on the path diagram, the connection between the H _2O composition and the density of the tablet is 0.054, which is already significantly less of a correlation than we saw last time when we just included the one variable. And this is because the rest of the component actually comes from elsewhere as we're exploring here. So again, we can see that that connection still is not statistically significant. And in fact, has an even weaker correlation than former. Now, if we look at our other connections, say, between our crushing strength and density of the tablet, we're seeing a statistically significant correlation and a reasonably correlated one at 0.593. So what does this actually mean in the context of this case study? Well, it means a couple of things and we draw a couple of conclusions from this. Firstly, the relationship between the H_2O composition and the density of the tablet is statistically significant which isn't something we knew before that was hidden behind the fact that we only perform the linear regression. And the second thing that we've learned is that crushing strength and the density of the tablet are correlated. So in a practical sense, this means that we can conclude if we want to control the density of our tablet, then this must be done through changing the H_2O composition but only as a means to alter that crushing strength to the desired value. And so if we actually came to optimize this entire situation in this entire case study, the crushing strength is where we want to pay attention to and anything we can do to affect that which isn't something we can change with just a dial. So in this way, you can see how SEM has allowed us to uncover relationships which may be missed in traditional modeling methods. And in turn, it provides a much deeper understanding of how our system works. So next time you're exploring the system, I'd encourage you to consider acquiring SEM to help you better understand the system both visually and more in depth.
The BioChaperone platform developed by Adocia covers oligomers of various sizes. However, they have the common characteristic of being of higher molecular weight than the impurities generated during their synthesis process. Thus, their purification can be achieved by diafiltration or tangential filtration. This technique, thanks to a membrane with pores of defined size, makes it possible to separate the molecules according to their molecular weight. Different parameters such as temperature, tangential flow rate, and pressures have impacts on the purification efficiency, as well as on the duration of the operation. However, the multivariate study of this step was hampered by technical constraints, such as the impossibility to change the temperature between each trial, thus limiting the randomization. Therefore, a split-plot design approach implementing a randomized block system was developed to optimize the purifications of our excipients.       Hi everyone. First of all I want to thank the Discovery Summit committee for letting me the possibility to present to you the work we perform in my team on the implementation of a split-plot design platform to study the purification of our innovative excipients. First of all, ADOCIA is a biotechnology company founded in 2005 by Gérard Soula and his two son, and we are located in Lyon, France. Our mission is to develop innovative formulation of approved hormones for the treatment of diabetes and obesity. The business model it is license product after the approved concept. And currently in our pipeline we have three patented technology platforms. And one product that is approved to enter phase three in China. Five products with clinical proof of concept, and six projects that are the preclinical stage. We are about 115 people in ADOCIA and 80 percent are dedicated to the R&D. Speaking about the technology platforms. Today we will speak about the historical one, which is the BioChaperone platform. The BioChaperone is a pharmaceutical excipient a synthetic organic one, and it will form a complex with a protein such as insulin or amylin or glucagon. This complex inspired by nature will improve the solubility or stability, accelerate the absorption of the peptide or protect it against enzymatic degradation. BioChaperone platform potentializes the performance of insulins and other hormones. And today five of proprietary products based on BioChaperone are in clinical development. The development BioChaperone chemistry I think it's the same in many pharmaceutical company. We will go from a early stage process that will deliver batches of few grams that will enter preclinical studies such as toxicology efficacy. And once BioChaperone is designed as a lead it will come in my department, in my team, to develop a final process to deliver phase three batches. And at the end commercial batches ranges at few hundred kilos per year. The changes that we will face are imposed by large scale feasibility and we will be driven by cost and performance and many changes will be suffered. So we need to understood them and document them. The goal of our work is to have a complete understanding of the relationship between parameter of variation and their impact on product quality. Indeed, as we will perform at large scale, we know that temperature cannot be targeted at 10.0 degree every time, it can be 11, it can be nine. This is suffered due to the large scale. We will speak about robustness, that the process need to absorb this inherent variability in a defined range. And we need to know its impact on the product quality. At this stage we will go from reproducibility at early stage, to robustness at large scale and final process stage. To do that we have tools. Two of them are the risk analysis that will help us to prioritize, to rank the work. And DoE are very useful because we know that in chemistry we have a lot of interactions between parameters, so this is very useful for us. Today, we will speak about the purification of the excipients. This purification is done by diafiltration. A quick overview of the process is that raw material enters a chemical transformation, that will give a crude excipient in solution. This crude excipient in solution will be purified by diafiltration to give the pure excipient in solution. What is diafiltration? We have first of all the classical filtrations that is called the dead-end filtration in which we have a solid or that in excipient in a solution and a membrane. the solution will come up to bottom with the pressure to recover the solid on the upper face of the membrane. A cross-flow filtration we have the retentate which is brought in parallel to the membrane using a pump. And we will apply a pressure using valves that will push a part of the flux to go through the membrane, that process defined pores and let only small molecules to go through the membrane. And we have the big molecules that will stay in the retentate. And we use this technology because we know that our excipient are oligomers, which means that they are not small molecules. They are not polymers, they are between the two sizes, but they are quite big and they will stay in the retentate. A quick overview of the unit. We have the retentate. A vessel with the retentate which will be brought through the membrane using a feed pump and a back pressure valve will allow us to have a pressure in the membrane and push through a part of the flux in the permeate that will eliminate the small molecule impurities. On the right you have the 50 liter scale diafiltration pilot that we use at ADOCIA on the back, the 50 liter vessel. On the up left we have the housing with the membrane and all the pipes used with valves and instruments to monitor the work process. For our study we have only one bulk that we can use. One bulk of crude excipient solution. The idea was to have a flow circulation of the flux meaning that the permeate is brought back to retantate every time like that we will have a retentate which is representative of the process. At the beginning of every event of the DoE. For a dry filtration we have factors that are determining very early in the process, which is the membrane reference, meaning that the cutoff size of the pore and the material of constrictions. Once it's set, it's set and we will not change it. And the loading, meaning the kilograms of excipient per surface of membrane, membrane are defined surface and the kilogram of excipient is defined by the process. When it's set, it's set. We will not change it. What we can tune as factor RDSA. The concentration of the BioChaperone in the solutions that we need to purify, the feed flow, which is the flux imposed by the feed pump. The transmembranar pressure, which is a way to control the pressure that will push the permeate through the membrane and the temperature of the solution. The responses we can look at which are the losses through the membrane that will impact the yield of the process, the impurities that goes through the membrane that are in the permeate that will impact the quality of the product, which is the most important part of this work. And permeate flow rate that will impact time. It will give insight on the whole process time at large scale. The objective of the study was to define design space which is a multivariate space that guarantees the conformity of the responses, Here, it's the quality of the product. And we will go for a design type which is a response surface model. The first attempt we ran. We ran the Box-Behnken design. To run a Box-Behnken design you go on DoE classical response surface design. I will load the response. Run one response and I will load the factors. One factors. Okay. Yes. Here we have the three responses. I speak about earlier losses, the elimination, the impurity elimination and the permeate flux. And we have the four factors which are temperature, pressure, concentration or assay and feed flow. Box-Behnken is the first proposal in this box. We will continue to make the table. And we have the standard Box-Behnken table with randomized run as you can see. We started to run this DoE and after one trial it was clear that we will not be able to run this DOE in a randomized order because we cannot concentrate or dilute the bulk between each one. If we concentrate the bulk through the membrane it's perfectly feasible but we will lose some impurities. Our bulk is not anymore representative of the upstream process. It is not a solution. We can distillate the bulk but it will take very long time because it's water. And the second parameters that will not be easy to change between each factor is the temperature due to the recirculation is quite longer to stabilize the temperature between each run when we have to change it. We've done something that will make some people scream in the audience. But we ordered the run by temperature. And by assays. Let me just add some color on it to have a better view. Value color. This is what we've done. We have assay which are in block with 90 then the 60 block and the 30 block and in each assay block we have temperature which has ordered 40, 30, 20 etc. We've run this DoE like that. Here are the data we obtain. We can analyze it using the fit model platform with the four factors and we will look at losses and run. We have quite a good model for the losses. It's okay, we have a good PValue, we have parameters. It's okay. But the thing is we were quite disappointed by the statistical approach because we know that the first rule is to randomize run to have a good estimation of the error. It was not satisfactory. We came back to our studies I would say and just as a reminder the state of play was that we cannot use a fresh batch between each one. Too much bulk will be required and we don't have it. The bulk assay cannot be changed between each one due to representativeness of the bulk and temperature cannot be changed easily because it will be really highly time consuming. We look at books and at the end of many books we found a solution which are the split plot design. Split plot design were introduced by the agriculture field because they have typically have to change factor such as farming fields. Let's say you want to study the different treatments on different cereal or crops and you don't have any room on one field. We will have many fields and these fields are different. But this is not the thing you want to study. The idea behind split plot design is to have a whole plot which are filled with that will be analyzed as random blocks. And then on each whole plot you will apply treatment or culture one, two, three, four and you will study it inside the whole plot and they are called subplots. How it is done on JMP. I will close this. This. This. And this. To run a split plot design you have to go on DoE platform custom design. I will load the responses for the run two. Open. And the factors. What we see here. Sorry. We see our four factors assay temperature, CFF is the feed flow and the pressure. And we have an additional column which is name changes. And you can tune the fact if it's hard, very hard to change or easy to change. Very hard to change is the concentration or the assay. The hard to change factor is temperature because we can change it but we don't inside assay block. And the easy factors are the two other factors that can be randomized between runs. We want to go for surface response model. You click on RSM here we put six whole plots and 12 whole plots. We have 36 runs and we make the design. It will take a few seconds to make the design. Just to remind you that for the Box-Behnken DoE. It was 27 runs. We have much more runs on this DoE. I will make the table and as for the Box-Behnken I will add some colors. Column. What did I do. Sorry, I redo the table. Yes. Okay, sorry. Yes. Okay. You see that we have assay which are arranging blocks and it corresponds to the whole blocks one, two, three, four etc and in each assay block you have temperature blocks which are the subplots one, two, three, four etc. And CFF and TMP are randomized inside those two blocks. We perform this DOE data and you can go for the model. Here we see the differences between the analysis between standard Box-Behnken DoE and split plot design. We have the whole plots that are added in the effects and they are treated as random blocks and we have all the other parameters and effects. And here we have the method which is the REML analysis method. We run the DoE. If we focus on the losses answer we see that we have a very good model with 96 percent of the variation explained by this model. We see that we have significant effects. And the additional box we have to look at with this analysis is the REML variance components estimates that will give us an insight on the behalf introduced by the blocks, the whole plots and we see that the PValue is not significant. We can go further. There are no issues with the blocks so such like other DOE performing JMP we can go and have a profile to optimize to define ranges and design space. Here we see that the assay is the most impacted factor on every responses and losses is the response that is impacted by all the four parameters of factors and we can see that with the parameters we use as target we are pretty good with the optimization. What could be interesting is to look at this model, this DoE using a standard analysis. I remove everything. I will take the four factor. All this. For standard analysis. I run it. Here. I come back under the losses and response. Okay. Here I will look at the losses. What we can see is that we don't have the exact same order for the parameters, effect or estimate. In the standard analysis we can say that assay is the most impacting factor while in the spectral design is the flux, assay's the fifth one and it's quite normal because when we use blocks and we do the standard analysis we will give more strength to these blocks and we will make errors on it and we will define it as impacting but it's not. This is perfectly normal. This is why you need to do an analysis using REML and blocks to have the right order of impacting factor. In conclusion of this study. In this case study split plot design allowed to carry on regardless of non randomization of some parameters. We were able to run in a shorter time frame the whole DoE. Even if it required 36 runs versus 27. But as we don't need to concentrate to stabilize the plotter between each run it was much more shorter. And we will be able to justify properly the design space with strong statistical evidence. It is worth noting that the split plot design platform is now implemented to quickly develop and optimize our proprietary excipient purifications and as take away messages. Just be careful on factor randomization. We know it's the first rule to run a DOE but it's very important to have a proper design that will allow you to have a statistical knowledge on your process and a proper design could allow to save time even if more runs are required. Thank you for your attention.
The definitive screening design (DSD) is almost certainly the 21st century's most exciting and useful innovation in design of experiments (DOE). As a screening design, the DSD offers unique properties for a much smaller number of a runs. And, if only a half or fewer of the factors are active, the DSD gives you the ability to fit the full response surface model. However, if your objective is to estimate the full response surface model for most or all of your factors, a DSD is inappropriate. In those instances, larger optimal designs or central composite designs (CCDs) are the preferred choices. Orthogonal minimally aliased response surface (OMARS) designs are a new family of response surface design (RSD) that bridges the gap between the small, efficient DSD and the large, high-powered CCD. In this presentation, we introduce OMARS designs by way of a case study comparison with other designs. We also demonstrate how JMP users can create and evaluate OMARS designs against DSDs and classical RSDs in an easy-to-use add-in that will help you to select the right design for your specific application.       Okay, so welcome. I'm Phil Kay, and I'm joined by Hadley Myers. We're going to talk about O MARS Designs, this new family of design of experiments and an ad d-in that gives you a gateway into that world. I'll start with an introduction, and I'm going to give you a motivating case study. I'm going to talk about how these OMARS Designs bridged the gap between our small, efficient, Definitive Screening Designs that we're all familiar with, and the larger, high- powered, more traditional r esponse surface designs that you might know of. Then I'll pass over to Hadley and he'll talk about how you, as a JMP user, can create and evaluate different OMARS Designs with an add-in that he's been working on. These O MAR Designs, they come from a paper by Jose Nunez Ares and Peter Goos. I'll just show you that briefly. They've worked through the enumeration of thousands of such designs, and we'll introduce you to what these designs look like. I've got a motivating case study to begin. This is from a published case study. This was published in the Journal of Clinical Chemistry. It's a response surface design. It's about optimizing clinical chemical methods, so an assay, in this case. The objective was to optimize an assay method, maximizing the response, which is called Elevated Serum 30 degrees C. They had six factors, each of which was a quantity of a different reagent, and they took a traditional approach. This was done quite some time ago. They generated a Central Composite Design, which is a very traditional response surface design for optimization with 48-runs, which includes four center points. I've used this to motivate the use of O MARS Designs. As an alternative to this 48- run design, I generated a 17- run Definitive Screening Designs for those six factors. I also generated an alternative 31- run O MARS Design. I took the model from the original 48-run experiment. So use that data, fit a model, and use that to simulate the responses that we might expect for the Definitive Screening Design and fully OMARS Design. We added an appropriate amount of noise to that to give us a realistic response simulation. What you're going to see through this example is that the Definitive Screening Design is effective at what it should do, which is finding the most important factors. The O MARS Design enables us to optimize the process by identifying and estimating all of the important effects from the response surface model. In this way, we are saying that these O MARS Designs, you can think of them as bridging the gap between Definitive Screening Designs and the traditional response surface method designs, like the Central Composite Design. Here is that Central Composite Design. Here are our six factors, and this is our response of interest. These are some of the models that we fit, and we're comparing. It's a traditional face- centered, Central Composite Design. This is just three of the factors visualized. You can see we've got our axial points here on the face of the cube that's described by the factor ranges. Those kind of designs are very good. They've got lots of nice properties in terms of the correlations between effects. You can see lots of white space here, which means zero correlations, orthogonal effects. They're not so great with the quadratic effects. There's fairly strong correlations between all of our quadratic effects on one another, which does reduce the power of our ability to estimate these quadratic effects. Nevertheless, we can fit a good model to that. This is the model fit to that original data. We identified that there really are four critical factors out of the six and there are various higher order terms as well that are important. Really the pH, this P 5P OG, and MDH are very important. The L-a spartic acid and this Tris buffer are much less important. We can build a good model using that design. It's quite a big expensive design, though, 48-runs. What would our alternatives be? Well, Definitive Screen Designs are obviously very good for screening these kind of situations, screening for the important factors. I've generated using the same factors, same factor ranges, a 17-run Definitive Screening Design for those six factors. A gain, I've simulated the response data there based on the model from the published data from the big experiment there. The definitive screen design does what it's supposed to do. It finds that we've got these four important factors, the pH, P5 P, OG, MDH. It's identified those and it's been able to identify some of the higher order effects that are important. Now at this stage, what you could do is augment. Screening design is all about screening for the important factors and then in the next step of the sequence, experimental sequence, we can augment to learn more about the higher order effects, the higher order terms for the response surface model. What I'm going to show you here, though, is an alternative approach we could have taken. Here is an experimental design with 31 runs. Again, same six factors, the same factor ranges. You'll notice that it is a three- level design. For each factor, we've got settings at three levels. It's a response surface design. If we compare it using the compare designs platform, then we can compare those two designs. The Definitive Screening Design here in blue, we're looking at the powers versus the 31 run OMARS Design. Well, it's not a surprise that the 31-run design has higher power. Generally, we've got more runs, so we would expect that. We can see significantly higher power for these quadratic effects, though. Another thing to look at that might be of interest is the color map on correlations. Here's our 17-run Definitive Screening Design, and you might recognize that color map, if you know anything about Definitive Screening Designs then. This color map is really a key. It demonstrates a key property of Definitive Screening Designs, which is that all of our main effects are orthogonal to one another, and the main effects are also orthogonal to the second order effects, the quadratics and the two-factor interactions. That's what all that white space there means. Then within the higher order terms, there is some degree of correlation, but no complete correlation, no aliasing. We are always able to estimate some of these higher order terms, and those higher order terms are, at least, orthogonal, completely separately estimated from the main effects factors, the fact ors main effects rather. Now if we look at this 31-run O MARS Design, you can see its got similar properties. Again, we've got orthogonal main effects, and those main effects are orthogonal to the second order effects. You can see we've got lower correlation between the quadratic effects, for example. Overall with the two- factor interactions as well, there are lower correlations. Why are these things called OMARS? Well, OMARS stands for Orthogonal Minimally Aliased R esponse Surface designs. Again, we've got orthogonal main effects, and we've got minimal aliasing between our second order effects as well, and it's a response surface design. In fact, both of these, both the Definitive Screening Design and the 31- run design are OMARS. They are both Orthogonal Minimally Aliased Response Surface Designs, so DSDs are a subset of OMARS. How well does this perform? What I've done is I fitted a model to that simulated data. Again, I simulated the response data for our 31- run O MARS Design, and I've compared that model against the 17-run Definitive Screening Design. 17-run Definitive Screening Design is doing a reasonable job of predicting the actual data. Here, we're comparing how well our two models from the Definitive Screening Design and the OMARS Design, how well they fit against the actual data from the 48-run published example. You can see a much improved model with the 31- run O MARS Design, as we might expect. In fact, the 31- run O MARS Design has identified correctly the higher order terms that are important, as well as identifying the important factor effects, which was pretty much all the Definitive Screening Design was able to do. Again, just to reiterate, what we're showing here is that these OMARS Designs are really an extension of Definitive Screening Designs, and they are a bridge between that small, efficient Definitive Screening Design and the larger traditional response surface designs. At this point, I'll hand you over to Hadley, who's going to show you more about an add-in that he's created that will enable you to actually explore this new class of designs for yourself. All right. Thank you very much, Phil. Hello to everyone watching this online, wherever you are. Thank you very much for clicking on this talk. Before I take you through the add-in to show you how you can use it to generate these designs and select the best one for you, I'd like to say that the add-in itself includes 7,886 files, each one containing a design, where the main effects are orthogonal to each other into the higher order terms. The add-in not only gives you access to these 7,886 new designs, but it also gives you access to all of these designs with an added center point. How can we select from among these almost 16,000 designs the best one for us in our situation, while the add-in provides us an interface to allow us to do that? I'll show you how that works. Right now, the add-in is called OMARS Explorer. What it will allow us to do is first indicate the number of factors that we have, and the add-in at this moment has the ability to generate designs for five, six, or seven continuous factors. We can write the maximum number of runs that we can afford, or that we'd like to do, as well as whether we'd like a design for which we can estimate all main effects or all the main effects, as well as all the two-factor interactions or the full response surface model. We have the option of generating parallel plots, something we can use to help us select the right design. I'll show you how that works. So I'll press okay, I can put in the names of my factors as well as the high and low settings, but I'm just going to leave it the way it is for now. I've been given this table with 2,027 designs that satisfy our requirements. Each one has five factors less than or equal to 35 runs, and we can fit a full response surface model. So how can we now select the best one? Well, one thing we could use is the local data filter, where we can select runs of a certain design of a certain run length with or without center points, as well as our efficiencies. The average or max variance of prediction and the powers for the intercept, the main effects, and then the minimum and average powers for the two-factor interactions in the square terms. If we have a full response surface model. Because we generated the parallel plots, we also have the parallel plot here. We can use all of this to then zero in on designs that are the best among all the ones that we've chosen. If the minimum power of the square terms was something that was important to us, and I can narrow my search to 10 designs rather than from among the 2,000 designs that were possible, once I've done that, I can press this, Get Summary Results script on the table, and then generate this table here with the names of the designs, whether the design includes a center point or not, the number of runs, as well as all of the metrics. Let's see, I think I'll go ahead and just choose this one here. I can press make design, and now I've been given this design a JMP. One thing I'll add is that if you choose a design with the center point, it adds a -0 at the end to indicate that the center point has been added to that design. I can now go ahead and add my response column, save the table, and I'm ready to start conducting my experiment. As Phil mentioned before, Definitive Screening Designs are a subset of OMARS Designs. Of course, there are many other Orthogonal Minimally Aliased Designs that are not Definitive Screening Designs. I'll show you an example here that uses six factors and a maximum of 20 runs. In this case, we only have eight designs that meet this criteria. I'm just going to go ahead and select all of them and press Get Summary Results. Now, this 13- run design here with the center point is actually the Definitive Screening Design for six factors. You can see that this design is in every way except for the power for the intercept, better than this 15- run O MARS Design, which is not a Definitive Screening Design. But I'm going to go ahead and select both of these so that I can compare the designs. When I do that, it'll open both tables as well as this compare designs platform. Scrolling down to the color map on correlations, I can see that the Definitive Screening Design, which is this one here, looks as I would expect it to. Of course, the O MARS Design is also orthogonal for the main effects. That's what defines it as an O MAR Design. But you'll also notice that this one happens to be orthogonal for many of the higher order effects as well. If I were to try to fit the full response surface model to add those terms to this model, of course, I won't be able to add all of them, but I'm able to add one additional term or to fit one additional term. Using my O MARS Design, then I would be using the 13- run Definitive Screening Design. If I tried to do that , so now you'll notice that the powers for the intercept, the main effects, and the quadratics are all higher for the O MARS Design. They are lower than the Definitive Screening Design for the interaction terms. Looking at the fraction of the design space plot, you'll see that the OMARS Design has a higher maximum prediction variance, but is lower than the Definitive Screening Design over more than 80 percent of the design space. Interestingly, the Definitive Screening Design platform doesn't have the ability to generate 15- run six factor designs. We can generate 13 or 17runs. If we can't afford 17-runs, but can afford 15, this provides us perhaps an option that may be suited to us that we'd like to consider or explore further. Once again, thank you all for your attention. At this point, I'd like to turn things back over to Phil. Thanks, Hadley. Just to summarize what you've seen there, what we've shown you, hopefully, you've seen how these Orthogonal Minimally Aliased Response Surface designs can bridge that gap between the small, efficient Definitive Screening Designs and large high- powered traditional response surface method designs. You've seen how there's more flexibility. There are Orthogonal Minimally Aliased Designs with three levels for different numbers of runs now. If a Definitive Screen Design doesn't meet your needs or a traditional Response Surface Method design doesn't meet your needs, you should now be able to explore these OMARS Designs. Exploring those designs is now made easier to use as a JMP user with the add-in that Hadley has created for you. We'll obviously post links to all of these things in the article in the community, and that's a great place to let us know if you've got any questions as well. Thanks very much for your attention.