Abstracts

0 attendees

0

Monday, March 9, 2020

Corinne Bergès, Six Sigma Black Belt, NXP Semiconductors Kurt Neugebauer, Analog Design Engineer, NXP Semiconductors Da Dai, Design Automation Engineer, NXP Semiconductors Martin Kunstmann, R&D-SUP-Working student, NXP Semiconductors Alain Beaudet, Product and Test Engineer, NXP Semiconductors Structured Problem Solving (SPS) is one of the three pillars of NXP Six Sigma system, with Quality Culture and Continuous Improvement, and demonstrates still more NXP Quality system maturity. Some key approaches in NXP SPS are fitting with the DMAIC/DMADV, 8D or 5-Why frameworks. They widely use statistics to change assumptions into evidences, necessary for a real defect root cause elimination: modeling, DOE, multivariate analysis, …Two specific statistical analysis are described. In design for automotive, about simulation of parametric, hard or soft defects, purpose is to implement the best algorithm to reduce number of simulations, without impacting test coverage or failure rate estimation precision: for this, JMP provides interesting options in clustering. NXP experiments will result in an algorithm and in some recommendations for the new IEEE standard on study about defect coverage accounting method. Now, downstream in manufacturing, when it deals with capability index computation, and with normality test, to bypass high sensitivity of these tests for a slight abnormality, a methodology was designed in JMP to quantify shift from normality, by using the Shash distribution and its Kurtosis and Skewness parameters. A script was implemented to automate it on the more than 3000 tests for an automotive product. (view in My Videos)

0 attendees

0

Event has ended

0 attendees

0

Monday, March 9, 2020

Fabrizio Ruo Redda, R&D Senior Manager, Vishay Semiconductor Italiana Semiconductor devices development requires an extensive work of testing and data analysis. Electrical characterization laboratory provides a large amount of data to development engineers, coming every time in different formats according to specific products under test. Usually development engineers spend a lot of time putting together the data, while a limited fraction of time is dedicated to value added activities like analyzing statistically the data and drawing sounding conclusions. In the frame of a lean six sigma project, it was possible to show the economic advantage achievable by eliminating the tedious data preparation or better “data moving” process. JMP scripting capability is used to manage complex data files from different testers so that they can be easily uploaded in a SQL database. Development engineers can now use JMP to download data directly from SQL in a format ready for analysis. The typical engineering analysis time has been reduced by more than 85%, but more important, it is now only dedicated to value added analysis without any intellectual waste. Furthermore, the quality of the analysis and conclusion is now improved, considering also the possibility to make quick comparison among products or with previously collected data.

0 attendees

0

Event has ended

0 attendees

0

Monday, March 9, 2020

Greg McMahon, Principal Research Scientist, National Physical Laboratory Multi-isotope imaging mass spectrometry (MIMS) combines stable isotope labeling of biological samples with high spatial resolution (sub-cellular) mass spectrometry imaging and extensive statistical analysis of the resultant image data. The images are rich in information, and use of JMP allows a quick and easy method of analyzing the data for information that is either just subtly contained within the image, or other information that may be below the first "obvious" layer of information. Combining Graph Builder with simple data distributions and local data filters provides a wealth of information. The approach can be extended by application of cluster analysis and multivariate statistics. In this presentation, we will use an example tracking the metabolic fate of 13C and 18O stable isotope labeled glucose in mouse breast cancer tumors engineered to contain cells with either high levels of the Myc oncogene, which is a driver for aggressive breast cancer growth, or low levels of the Myc oncogene. We will finish with a few comments about the significance of the results in terms of cancer research for the non-expert.

0 attendees

0

Event has ended

0 attendees

0

Monday, March 9, 2020

Bryan Fricke, JMP Principal Software Developer, SAS JMP is a powerful tool for generating statistical reports for evaluation by decision makers. However, when it comes to preparing reports, accuracy and comprehensibility are only part of the story. For example, psychologists Amos Tversky and Daniel Kahneman have suggested that presenting results in terms of a potential loss can have about twice the psychological impact as an equivalent gain. In this session, we will explore the role perception plays in statistics-based decisions and how knowledge of that role should inform JMP users with respect to generating reports for decision makers.

0 attendees

0

Event has ended

0 attendees

0

Monday, March 9, 2020

Vincent DE SCHUYTENEER, Data Engineer, Lynred After receiving a new equipment for process, we adopt a quality method to qualify and industrialize it. First of all we realized a first DOE, and then to deep results by 2 additionnal DOE. This allowed us to understand and modelize behavior of equipment on a large window of process. As we were able to catch log data, we developped a JMP script to integrate them into a JMP table. We used Functional Data Exploration to analyze these data, that we combined with hierarchical clustering. Thanks to data and methods, we identified 2 new parameters that were infulencing product performance. Finally this complete methodological approach with JMP tools helped us to increase quickly, significantly and deep our knowledge about the equipment behavior. We have now a complete model for this equipment that will be very helpful in the future during its production life. (view in My Videos)

0 attendees

0

Event has ended

0 attendees

0

Monday, March 9, 2020

Stephen Pearson, Specialist Data Scientist, Syngenta Many powdered materials slowly oxidize with time which generates heat. If in a bulk form (such as during transport or storage) then heat generation can exceed heat loss, leading to ignition. Climate control and limiting packing amounts can reduce the risk, but this increases the costs for the consumer through reduced logistical options, larger shipping volumes and disposal of additional packaging. Laboratory tests are well established to determine a safe packing size. However, they are costly, especially for new products where limited amounts of material are available. The physics of the oxidation process can be simulated provided all the material properties are known. Using JMP® we will demonstrate how to combine these two approaches to reduce the amount of thermal stability testing required: 1) generate a constrained spacing-filling experimental design; 2) control the simulation software (COMSOL Multiphysics®) via JSL; 3) build meta-models; 4) simulate the outcome for new materials. By obtaining estimates of different material properties with each test, the prediction uncertainty can be updated to suggest the range of suitable packaging given the available data. This enables a data-driven approach to the selection of laboratory tests. (view in My Videos)

0 attendees

0

Event has ended

0 attendees

0

Monday, March 9, 2020

Oliver Thunich, Consultant, STATCON GmbH In industry production, tolerance intervals are widely used to determine the quality of a process.The commonly used tolerance intervals, however, assume normal distribution of the data, which is problematic for many processes. Following a request of a client, we came up with a possibility to calculate nonparametric tolerance intervals by calculating confidence intervals for quantiles using the nonparametric empirical likelihood approach implemented in JMP. As the desired sample sizes become very small, the traditional nonparametric confidence intervals tend to return unstable results. Therefore, we developed a JSL script that extends the empirical likelihood method and is able to generate stable, nonparametric tolerance intervals for a large proportion of the population even with small samples. A simulation study evaluates the performance of the approach in comparison to existing methods using production data as well as survival analysis data. We found that the proposed method is much more stable than existing methods, especially when the data heavily differs from a normal distribution. Using JMP in combination with the implemented method, we are able to assure quality of processes where measuring quality is very costly and/or time consuming.

0 attendees

0

Event has ended

0 attendees

0

Monday, March 9, 2020

Intro Video: Click Here First (view in My Videos) Main Presentation Video: Click Here Next (view in My Videos) The Pictures From the Gallery 5: A picture is said to be worth a thousand words, and the visuals that can be created in JMP Graph Builder can be considered fine works of art in their ability to convey compelling information to the viewer! This journal presentation features how to build popular and captivating advanced graph views using JMP Graph Builder. Based on the popular Pictures From the Gallery journals, the Gallery 5 presentation and journal features new views available in the latest versions of JMP. We will feature several popular industry graph formats that you may not have known could be easily built within JMP. Views such ridgeline density plots, bag plots, informative box plots and more can help breathe life into your graphs and provide a compelling platform to help manage up your results. Attached: - Pictures From the Gallery 5 JMP Journal (Includes Pictures, Instructions and Scripted Sample Data so you can learn, practice and replicate all the graph views from the gallery!) Bonus: See past Pictures from the Gallery Journals: - Pictures from the Gallery 4 (Pictures From the Gallery 4) - Pictures from the Gallery 3 (Pictures From the Gallery 3) - Pictures from the Gallery 2 (Pictures From the Gallery 2) - Pictures from the Gallery 1 (Pictures From the Gallery 1)

0 attendees

0

Event has ended

0 attendees

0

Monday, March 9, 2020

Laura Lancaster, JMP Principal Research Statistician Developer, SAS Jianfeng Ding, Senior Research Statistician Developer, JMP In this age of big data and complex manufacturing there is often an enormous amount of process data that needs to be monitored and analyzed to maintain or improve quality. JMP has several tools to help the analyst quickly and efficiently increase the scale of their process monitoring. The Process Screening platform allows users to easily scan processes for stability and capability and enables them to focus attention on processes needing improvement. The platform initially computes a summary report based on control chart, capability and stability calculations, and creates several graphs for quick visual assessment of process health. Based on these initial results, it is easy to select the processes needing attention and explore them more in depth with access to Control Chart Builder and Process Capability. The Model Driven Multivariate Control Chart (MDMCC) platform, new in JMP 15, allows users to monitor large amounts of highly correlated processes. This platform can be used in conjunction with the PCA and PLS platforms to monitor multivariate process variation over time, give advanced warnings of process shifts, and suggest probable causes of process changes. We will use case studies to demonstrate how to use JMP to monitor and analyze many processes for fast and efficient improvement.

0 attendees

0

Event has ended

0 attendees

0

Monday, March 9, 2020

Frank Deruyck, Lecturer, HOGENT University of Applied Sciences and Arts In this presentation an optimal DOE and statistical models are created to maximize performance of a chemical looping process with CO2 capture to generate H2 and synthesis gas, potential new recourses for energy and circular economy. The complex fluidized-bed reactor used is subject to several possible interacting and quadratic effects, as well as random noise, so a thoughtful experimental and modelling strategy is necessary. In JMP the DOE and analysis platforms offer a wide variety of DOE preparation and model fitting options. This paper will illustrate how to decide between an orthogonal RSM, custom DOE and a DSD based on R&D criteria and goals, and model objectives and DOE diagnostics such as power, factor correlation and variance profile. Model building occurs by screening out effective factors using stepwise regression (fixed factor forward selection and all possible models) followed by REML analysis eliminating random noise variance. Useful models for methane conversion and synthesis gas yield are obtained and supported by additional validation experiments. The profiler desirability function is used to compute the optimal operation conditions. This work demonstrates the possibility of optimizing a complex technological process with a careful DOE setting and statistical modeling approach.

0 attendees

0

Event has ended

0 attendees

0

Monday, March 9, 2020

Laura Lancaster, JMP Principal Research Statistician Developer, SAS Jianfeng Ding, JMP Senior Research Statistician Developer, SAS Annie Zangi, JMP Senior Research Statistician Developer, SAS JMP has several new quality platforms and features – modernized process capability in Distribution, CUSUM Control Chart and Model Driven Multivariate Control Chart – that make quality analysis easier and more effective than ever. The long-standing Distribution platform has been updated for JMP 15 with a more modern and feature-rich process capability report that now matches the capability reports in Process Capability and Control Chart Builder. We will demonstrate how the new process capability features in Distribution make capability analysis easier with an integrated process improvement approach. The CUSUM Control Chart platform was designed to help users detect small shifts in their process over time, such as gradual drift, where Shewhart charts can be less effective. We will demonstrate how to use the CUSUM Control Chart platform and use average run length to assess the chart performance. The Model Driven Multivariate Control Chart (MDMCC) platform, new in JMP 15, was designed for users who monitor large amounts of highly correlated process variables. We will demonstrate how MDMCC can be used in conjunction with the PCA and PLS platforms to monitor multivariate process variation over time, give advanced warnings of process shifts and suggest probable causes of process changes.

0 attendees

0

Event has ended

0 attendees

0

Monday, September 21, 2020

0 attendees

0

Event has ended

0 attendees

0

Monday, September 21, 2020

近年来，偏最小二乘(PLS)已被用于建立光谱数据的预测模型。使用函数数据分析器（Functional Data Explorer）和实验的协变量设计一种较新的方法，允许在具有良好预测性的预测模型中使用较少的光谱。这种方法使用了四分之一到三分之一的数据，否则这些数据将用于建立基于光谱数据的预测模型。新的多变量平台，如模型驱动的多元控制图（MDMCC）也将作为增强光谱数据分析的方式被展示。

0 attendees

0

Event has ended

0 attendees

0

Monday, September 21, 2020

工程师和分析师在处理实际问题时，往往遇到大量的异常数据“潜伏”在庞大的数据表中而不易被察觉，或面对这类刁钻的数据不知该如何处理。比如来自供应商的数据、无监督的数据（比如临床试验数据）等。探索模式（Explore Pattern ）是JMP15中新增的数据处理平台，通过查找数据中的异常模式，来审查数据的完整性。比如：重复数据、随机数字、列之间的线性关系、规格限匹配和分布。Laura表示，这些异常数据往往由以下四种情况产生：复制粘贴而来、应用随机变量生成数据、利用列之间的公式来创建线性关系、规格限以外数据的更改或截取，并围绕这四种异常数据类型，分别在JMP中演示了如何识别这些异常数据。

0 attendees

0

Event has ended

0 attendees

0

Monday, September 21, 2020

王部长将通过多个案例阐述统计分析在产品质量回顾、日常质量管理（包括实验室数据分析、预防性管理、变更管理等）、验证管理、产品稳定性考察和药物警戒中的实践应用，并充分运用JMP拟合分布、拟合模型、统计过程控制分析、变异性分析、退化、稳定性检验、筛选、文本分析与挖掘等科学的分析方法在药品质量管理各阶段的广泛应用，确保药品质量和工艺的稳定性与可靠性，从而满足监管机构严格的管理要求，为药品推向市场提速增效。

0 attendees

0

Event has ended

0 attendees

0

Monday, September 21, 2020

王逸群女士将分享JMP Clinical为基石药业的分析师们提供了一种简单快速生成患者叙述的工具和解决方案，无需编程，只需点击，即可一键自动生成患者叙述，并且可以按患者、主题、不良事件筛选生成可视化的图表和分析，极大地节省了医学编辑在挑选数据时所可能产生的人为的错误，减少了统计分析人员准备患者叙述的时间和工作。同时也介绍 JMP插件（add-in）安装说明及CDICS 转换插件如何使用，并通过案例分享如何利用JMP Clinical产生患者病例叙述。

0 attendees

0

Event has ended

0 attendees

0

Monday, September 21, 2020

在Icy的演讲中，她将分享和演示在JMP Clinical中，无需编程，通过点选的方式即可高效快捷地生成医学监查所需的系列报告，包括人口学分布、不良事件、合并用药信息、实验室检查、肝功能筛选等详细报告，大大节省统计编程所需的时间和资源。Icy进一步展示在JMP Clinical中不仅可以按性别、种族、不同颜色来做子组筛选与清晰的可视化展示，还可以按照主题列表显示、直观地查看特定SQM的AE持续时段等。

0 attendees

0

Event has ended

0 attendees

0

Monday, September 21, 2020

在本演讲中，嘉宾将介绍一种构建过饱和设计（SSD）的新方法。此方法会将筛选因素分为几组，并保证不同组中的因素彼此正交，同一组中的因素弱相关，同时指定第一组为假因素组，从而得到误差方差的无偏估计，并开发有效的、基于设计的模型选择程序。仿真结果表明，这些设计与我们的模型选择程序一起使用，可以识别出比以前过饱和设计更多的主动主效应。这些设计及其自动化分析是JMP 15中的新增功能。此次演讲将提供GO SSD方法的演示，采用12次实验，4区组的正交超饱和设计法，就可以得到5个最显著的因素。

0 attendees

0

Event has ended

0 attendees

0

Monday, September 21, 2020

可靠性分配课题研究的是如何决定系统中单个部件的可靠性从⽽保证系统可靠性达到设计要求。同时这个决策过程需要考虑其他制约因素，⽐如提⾼部件可靠性的成本和部件可靠性的可⾏性。从数学⾓度看这个问题，这是⼀个有约束条件的优化问题。其任务是在系统可靠性达标的条件下，使成本最⼩。作者看到，在现有的软件和⽂献中，解决此类问题的⽅法通常假设成本⽅程是部件可靠性的光滑连续⽅程。之所以有这样的假设，是因为他们所采⽤的优化算法需要⽅程具有连续性和可导性。然⽽这样的连续⽅程在实际中要么没有意义，要么很难获得。如果盲⽬地采⽤这类⽅法，使⽤者其实是在解决错误的问题。在本⽂中，作者讨论⼀个⾮常容易接受的成本⽅程。这个成本⽅程可能⾮常普遍。作者借这个成本⽅程来演⽰如何使⽤JMP来分析两个简单系统的可靠性分配的问题。虽然例⼦简单，但是其步骤可以应⽤到复杂系统，或者不同的但更真实的成本⽅程。

0 attendees

0

Event has ended

0 attendees

0

Monday, October 12, 2020

While journalists have long made use of data in breaking news and investigative reporting, media outlets are increasingly using data visualization as a tool to convey information to the public. Compelling graphics not only make the fact-finding aspects of journalism more transparent, they are also an essential part of the investigative process. Data journalists in particular rely heavily on exploratory data analysis. In this panel discussion, we heard from three data journalists who have used analytics to shed light on some of the most important issues of our time: Anna Flagg’s investigation of the spurious connection between immigration and crime for The Marshall Project; Andrew Ba Tran’s opioid crisis reporting for The Washington Post; and Northeastern University’s Aleszu Bajak’s work on the spread of COVID-19 misinformation.

0 attendees

0

Event has ended

0 attendees

0

Monday, October 12, 2020

As we develop analytical tools in JMP, we inevitably must make decisions about how to prioritize: Should we make the product more powerful by adding more muscle? Should we make it sexier, more exciting? Should we focus on pain relief, making it less frustrating, less burdensome? In the language of long A-words, should we go for anabolic, aphrodisiac or analgesic? John Sall thinks the answer is analgesic. Pain relief should be the central motivating force for development. Of course, the three aren’t mutually exclusive. Adding an exciting power feature could also relieve pain. But pain relief is central, because pain is the condition that can really freeze us, demotivate us, make us stop at a less-than-full perspective of what our data can tell us. Auto-generated transcript... Speaker Transcript Jeff Perkinson When it comes to developing JMP features, John Sall, our next speaker, has an interesting way of thinking about how we prioritize. We do have to prioritize which features we... which features we want to invest in. So how do we think about what's most important? Are features that make our product more powerful what we should work on? How about features that make it more attractive or sexier or features that focus on pain relief, easing the burden of analysis? Well John thinks that pain relief should be the main motivator for our R&D team. John is the co founder of SAS. He's the lead architect of JMP, which he founded more than 30 years ago. For scientists and engineers who have data and need a way to explore it needed...need a way to explore it easily visually on the desktop. He's going to talk about the driving forces behind JMP development and the ways that we come across features we want to work on. So with that, I'd like to welcome you, John. Thank you. John Sall Thank you, Jeff. And here we are live from the new digital conference center in Cary North Carolina, where we can fit a lot more people. So let's dim the lights a little. Well, I'm not really here in digital conference center. I'm really at home. So I got a home office. Whoops. Not that one. Like all of us. I'm at home, delivering this conference. But let's switch to a corporate background and get started. So what should I talk about? Well, when we have JMP releases, on those years that the conference alliance along JMP releases, that's what we talk about, but on the alternate years I always just pick a topic. So I picked big statistics one year, or ghost data the next year or secret features two years ago. Today my topic is big words that start with the letter A. Well, these are words characterized themes for developing JMP. So let me share my screen. And minimize here. Share. So which theme is most important? And here are the three words start with...the big words that start with the letter A. As Jeff just mentioned, first is anabolic. Well, anabolic is all about growing muscle, making JMP more powerful. Anabolic processes build organs and tissues and these processes produce growth and differentiation of cells to increase body size. So increasing the power of JMP would be the opportunity and that's of course very important, but the next aphrodisiac, making JMP more sexy, more exciting, the thrill of discovery. So we want that to be an attribute too. But we also care about your progress during the work stream, so analgesic is the third word, and that means pain relief. pain relief. So I want people to express things like the following imaginary quotes, \!"Version 15 has just served me...saved me hours or even days of work that I used to have to labor over.\!" That's a lot of pain relief. Or \!"I used to have to do each one individually and now it is all in one swoop.\!" A lot of pain relief there. Or \!"I used to have to use several other tools to get everything done, but now I can do everything in JMP.\!" Or \!"I showed my colleagues how I did it so easily. And they were amazed.\!" Or \!"I used to have to program all this manually, and now it's just a few mouse clicks and it's done.\!" Or \!"JMP keeps me in flow, keeping my attention on analyzing the data rather than diverting it too much to operational details.\!" These are the expressions of pain relief that we want to hear. Now why well pain and frustration are real demotivators. Flow is important. You're undistracted when in flow. When in flow, you're, you're more likely to learn something from the data and make discoveries. Now empower is important too, but new features tend to be used by only a few and old features by many. So we want to make the old features work better to reduce the pain in using them. And of course, productivity is hugely dependent on how effective our tools are with respect to the use of time. So if we can make it easier and faster to get to the results, that is a huge win. So let's go through a lot of examples. First, the scripts that come with the data tables. So here you see side by side, an example from an earlier release of JMP. I think it's JMP 14 or 13, where where you used to have to hold down the mouse key on this button and then it brought up a menu item and then you found a run button to run that script. But the new way just has a play button right next to the script. So you just click on that play button and it saves pulling down a menu. Well, that's not a huge win, you think that just the difference between clicking a button and pulling down a menu is not very much. But it's also a big win for new users, because new users no longer have to understand that a hot button will have a run underneath. But there's the play button right there with the script. And so there's a more understanding that these are scripts. And the ability to store scripts with your data is a big deal in JMP and we want people to learn that right away and not have to fish around two or look at a tip to learn that that's the way it does. And it's a great convenience to be able to store all your scripts within the data table itself and not find other places to store them. So for new users the play button. So, that is some pain relief just in that simple change. And this is a change that we regret not making much earlier in JMP. Let's talk about preparing data. And one of the big things that you have to do when you prepare data is join data from multiple sources. Now JMP, even from version one, had to join utility. But this still involves complex dialogue and working with multiple data tables. So here's the classic example where I'm renting movies and we have our movie rentals transaction table that has the customer, the item number, the order date, the order year, and then we have our inventory of movies to rent, details about each movie. And then we have the customers...information about the customers and so on. And we want to ask questions, involve all three tables like which genders are prevalent in which genres of movies. Well that involves going across all three tables, and of course, the join command is the way to do it. I take the transactions data and I merge it with customers. And of course, customer ID is the matching column and I want to include the non matches from the transaction side. So if I don't have customer data, I still don't lose my transaction data. And that's called the left outer join, and so that makes the new data table. And now to that data table, I want to join that to the inventory data. So I join that by the item number which I match then and that I also want a left outer join and now I have another table. And so now, instead of having three tables, I have five tables, the results of that join, but now I can look at answering the question of gender by genre, which gender tends to rent more of which type of movie. And involved making all these new intermediate data tables as well. It turns out that there's a lot easier way to go and the pain of just these extra steps and extra tables can be vastly reduced. So let me undo this table on this table. And I want to point out under the tables command, there is a command called JMP query builder and this brings up the the sequel facility for joining tables, that's much easier than all those join dialogues, but it still also makes new data table. But a couple releases ago, we came out with something even better that reduce the pain even further and that's virtual join. So if I look at this transaction table, I noticed that it's already joined. So what's happened is that when I prepared the, the movie inventory table, it's uniquely identified by item number. And so I give item number then a link ID saying it's uniquely identified by that and that's the way I can index that that table. And similarly, I have an item ID to identify the customer data. And when I prepare my transaction data, I have these two have, in a sense foreign keys, a link reference and it's referencing different data table by the, the...the identifying ID across them. And so, it automatically links it and I already have everything I need to answer that question. So I can do for each gender, I can find the genre of the movie I want to do. And so I can see here with action movies are liked more by the males and family movies by the females and rom com more by the females and so on. So I've answered my question involving three tables without the need of going through all the joins, because it's virtually joined by matching up those ID columns across the tables. So I've made a lot of pain relief from that. So let's close that, in fact. So data table joining has involved, I think, a lot of pain relief over the releases. Recoding, that's one of the main other things we do when we're preparing data sets. We started with a very simple recode many releases ago where we just had a field for the old value and the new value and the new value was started out with the same as the old value and then I can cut and paste or edit those values in order to recode into a new column or the existing column. And recode every release has gone through dramatic improvements and we're very excited with the current state of that. And so let's look at this data set of hospitals, district hospitals. And now recode is right on the right button or right click menu in the column header. And I can go through and just select a bunch of things and group them to new value, like clinic. But I think there are lots of shortcuts as well. And one of my favorite shortcuts is to group similar values. And so I can give some threshold of the number of character differences and so on and it will automatically group all those things. So it'll group all the district hospitals with slightly different spellings into one. And I can still... looks like it missed one. But it's okay, I can now group that one with the others and all the, all the rural hospitals have been automatically grouped by those matching and looks like I can match an extra one here. And now all of a sudden I have something. I've saved a lot of effort in doing all that recoding. And of course, in JMP 15 we added new features, which I won't show here, but I want to point it out. If you have another data set with all the ...proved that...the best category names for those categories, I can go through a matching process and have it choose the closest match when it finds a match to all those things. So so recode has gone through a lot of evolution over the releases to reduce the pain in that operation. And of course, think about it. How much time do we spend preparing the data versus analyzing it? Often it's anywhere from 70 to 90% of our time is spent preparing the data. And so, reducing the pain of that and reducing the time of that become major wins. So pain relief on recoding. Clicking on values. Here's the cities data set. And let's suppose when I'm looking at the details for some of these things, like might be an outlier for something, I want to look at it in greater detail. And so I can turn that into a link. So let's let's turn it into a wiki page Wikimedia page thing. I'm going to copy that text there and go to column info and then turn that into a link, which is the expression handler. And now I can take this table and convert it into a web address. So I'm going to go to wikipedia.org and then take the the name of that city and change it to title case and I can test it on that row and brings up Albany. And so now I'll say... Oh, OK. And now I can click on any city and then get the Wikipedia article on that city. Or it can change it to map coordinates. So I can...let's copy that text and go to the column info and now instead of that, I'll do title case on that. Let's see if that parses OK. And now I can click on Denver and it will query Denver and the Google Maps. Okay, so I've made things into links to do...I can do searches, can do map requests and so on. I can even paste this speak into it and click on that and have it speak the name of the city. And so this I think gives great power to be able to store links in our data tables that link out to web pages or do other things, anything you can express with ACL, you can do with those event handlers. So that can reduce a lot of pain. So one of my favorite pain reduction techniques is broadcasting, and this is where you hold down the command key or the control key to do multiples, where you have a lot of analyses that are similar. Okay, so this works not just for menu commands, but for many buttons and for resizing graph frames and doing other things, pasting things into graphs and so on. So let's do an example of that. And let's click on...or just do a distribution on all these columns. And let's say I wanted to get rid of this box plot. And if I just held down that menu item, I would eliminate the box plot for that item. But what I want to do is eliminate the box plot for all items. So what I'm going to do is hold down the command key and uncheck the box plot and now it's unchecked for all those things. I can now uncheck the quantiles and it will uncheck for everything, because it has taken that command and broadcast at all the active objects in that window, and for those active objects that understand that command, the quantiles command for example, it will then obey that that command. Now that even works for some things that have to do with prompting dialogues. So let's look at summary statistics. Let me hold down the command key and customize summary statistics. And let's say I don't want the confidence limits, but I do want the number of missing values. And I'll say, okay, and it's going to apply that to all... all the things. In order to do that, it's had to take the results of that dialogue, make a script out of it and then broadcast that script to all the other places in that window. Now that doesn't happen all the time. Sometimes when you get a prompt, you will get a prompt for every analysis. For every release we try to implement a few more details, where the dialogue is done before the broadcast so it'll broadcast the same dialogue to every place available. And of course, everything else I can broadcast a change and it will change, you know, the size of all these items, or I can broadcast changing the background color, and I can change all the background colors to orange in all the plots that that seem relevant for that that frame box. So this broadcasting becomes a very powerful tool. So, that has saved a lot of pain for most of us, but there's still cases where you may have 40,000 by groups, as came in earlier this year, and he wants to do an inverse prediction on all 40,000 and it prompts you 40,000 times. That will be fixed for version 16. That is a lot of pain relief for that one user. Saving formula columns. I love formula columns. When you fit a model you can save a column of predictions. But in that prediction, there's a live formula so that you can examine that formula, you can modify it, you can apply it to new data that comes in. You can profile it to the save formula. You can if you have JMP Pro, you can go to the formula depot and do a lot of other things. You can generate code from it and so on. So the ability to save formulas is an important thing, but sometimes if you have by groups, for example, saving formulas has a lot of extra subtlety. For example, let's fit, this is the diabetes data, where we fit the response against all these predictors and but we want to do it separately for each gender. So let's do that. And so we have two by groups for gender one and gender two. And now let's say we want to save the predicted value, in fact, for both of these. Now in the old days before we subjected to this to a lot of pain relief methods, what would happen when you saved it, it would go to a local data table. If you look in the data table window under redo, it shows the by group table. So really for gender=1, there's 235 in this virtual data table here, which you can show for gender=1. There's a table variable here that shows the the by group for it. And it would save it to this temporary table instead of the real table that you have that is really saved. But some time ago, when you saved the prediction formula, we save it to the real data table. And if I look at this, I just saved it for gender=1, and if I look at the formula for that, it shows that if gender=1, that was the by group clause, then it has this linear combination that forms a prediction for that variable. And then if I then do gender=2 or if I held down the command key to broadcast prediction formula, now it has both of them and I have both clauses available for gender 1 and gender 2. So it's adding a clause each time to that output by group. By the way, the same thing happens when you save other things, when I save a column of residuals, and let's hold down the command key this time. If I save a column of residuals, it will save it for each by group, it will save the residual appropriate for that by group. It's as if you subtracted predicted formula from the original response, but this time without being a formula column. In almost all places in JMP, we haven't done it everywhere, but we've done it in most places, saving into a by group or in some cases through a where clause, it will then save it to the original data table with the condition in the if statement. Okay, so that has saved a lot of effort. If that hadn't been available, tt was saved in a temporary table and then you'd have to cut and paste that formula and make if tables to the original table. So we've solved these problems. Now let's suppose I saved it again, did that whole process again, and and did another thing, say with different variables, some of them removed, and now saved it again. So, I save prediction formula, this time holding down the broadcast key, it's actually making a new one instead of saving into the old one and appends a 2 after it. Well, how does it know not to save it into the old one? Well, each fit has an ID to it. And if I look at the properties by ID, it has the number to it, a unique number which is regenerated for each fit, and so as long as it has a different by ID, the different by groups will have the same ID with one by partition. But as long as it has the same by ID, it will save it into the same thing. Otherwise it will make a new place to form to save it. Also, whenever we say predictive values, especially prediction formulas, it will create a attribute the, the creator and in some cases other information. In this case it has the target variable y, that's what is predicting and the creator fits least squares. And then when you do other platforms such as if you have JMP Pro model comparison, then that model comparison will understand which which predicted values referring to which creator, in which target. And so it can keep track of all that information for those added value platforms. So, formula columns work by adding new clauses and other properties are used, including the by ID. And the prediction clauses, if you save them to categorical variables, it saves a whole range of variables to have the probabilities for each response level, and all that is contained with all the metadata and needs. And that should save a lot of pain. So, removing effects from models. Well, let's look at an example of fitting just fitting height by weight. Well, here's an example of a high degree polynomial. This is a seventh order polynomial that fits better than sixth order polynomials and so on. But would you trust that fit? It turns out that if you give the model a lot of flexibility by introducing high polynomial terms or just by introducing more variables, it gives a lot more opportunities for to fit. And in this case it's it's allowed the flexibility to make a deep dive between the the previous data and the last data point just to fit that last data point better. And so it's overfitting. It's way...it's using that parameter to fit noise instead of fitting the the data itself. And so overfitting is is a problem anytime you have big models. So you want to fit the signal, not the noise. Okay, large models tend to introduce more variation into the prediction, because the prediction, after all, is a function of the y variable they're using, but also that y variable is is is a systematic part of that variable plus the error part of that variable if that variable part is random, then your prediction involves that that randomness. And if you allow it too much flexibility, it's going to end up with an overfit problem that you're going to predict much worse by including all the variables in that model. So the cure for that is to shrink the model, to reduce the size of the model or reduce the size of the coefficients in the model, so that less of that variation from the random term of the model is transmitted into the predicted value. And so in small DOE problems, this is not an issue because the data is small and the data is is well-balanced to fit exactly what model you're going for. But for observational data in any large models, overfitting is a real problem. So users often didn't appreciate that until we introduced cross validation in JMP Pro. So if you have JMP Pro, you can set up a validation column, which will hold back some of the data in order to estimate the error on that hold back data set. And here's an example where I have...I'm trying to predict the concrete properties depending on all these ingredients in the concrete, and I have a huge model for it. And if I just run that model, but hold back some of the data and look at the R square on that, I've for SLUMP, I have a great fit. I have an R squared 79, but on the validation set I fit, I have an R square that's negative. Any R square that's negative means it fits worse than just fitting the mean. So if I'd fit the mean, I'd have an R square of zero. If I fit this whole model, I have an R square that's much worse than that. So, with large models, you can go worse than just fitting the mean it's it's worse than It's kind of anti informative because the model is so big and we had no effort and cutting down the size of the model. The model has been dominated by the noise and not the signal. So this is a problem that you should pay attention to. So, the important part is to be able to reduce the size of the model. And now we did introduce a model dialog command to do that. Let's go into the diabetes with with model. I can run this model and if, let's say I want to take out a lot of these things are not very significant. You know, age has a totally non significant contribution to the model, and so I want to eliminate age. Well then I would, I could go back to fit model and recall it or and then eliminate age. Or there's, I can just go back to the model dialogue directly here and fit age and remove it, but I may have a long list of models here and going back and forth to do these things is a fair amount of work. And so rather than do that... let's see, what am I doing here. Several releases ago, we introduced a new report called the effects summary. And with effacts summary, it makes it trivially easy to subset the model to make it predict better, give it less flexibility. So I can say, well, age, I can remove that. Or, well, there's lots of variables. Let's remove three more. LDL, well, that looks more significant. And so I can save that model. Let's just save it to the data table. And I come back later and say, well, let me remember that model that has this. I can actually undo the previous thing. And so I can undo that and it brings back three of the variables and I can undo that and it brings back age. So it actually stores, when when I use the effects summary to edit the model, it actually stores a memory of all those things. And if I look at that script, I can see this history thing. So every time I did effect model, it storing a clause of history. And when I do the undo's, it's undoing it back to the that history. So removing is easy. I can also add things. If I subtracted these two things I could add it back. I could add back, say, age. Not a good thing to do, but I can do it. I can also edit the model by bringing back a small version of the model thing and of the model dialogue and add compound effects and so on. Now another thing that happens with large models and (let me undo this). With large models, you're doing a lot of hypotheses tests. So if you have a large number of hypotheses tests, there's some adjustments that you should consider and one of them is called the false discovery rate. And so instead of treating all these p values as if they were independent tests, I want to apply an adjustment so that those p values are adjusted for the multiple tests bias, for the selection bias involved in subsetting the model. And so I can apply a false discovery rate correction to it. So instead of the regular p value, I have the false discovery adjusted p value for that. And now I'm being more realistic. So this is going to help me with the overfitting problem and the hypothesis...the selection bias problem in doing a lot of multiple test things. So, now let's do the next topic and that's transforms. Suppose you want to do a model, but instead of y, you want to fit the log of y. Well, before what you would do is create a new column in the data set, Log y, and then for Log y, you specify a... do a formula. And I can take a log of it. And now I can go back to my model specification and do that Log y and then I got my fit. Oh, it's missing, what did I do? Forgot to enter the objective, the argument. And so now I can fit the log of Y And now I've done it. But let's do the profiler. My profiler is in terms of the log of Y. Let me save the predicted value. My prediction formula is in terms of the log of y, and now I'm going to have to go and hand-edit that... that formula and take the exponential of that to bring it back on on the original scale. So that's that's a lot of pain doing transforms that way. Well, several releases ago we introduced transforms. So I can take that transform and transform it to the log and now use it directly there as a transform and it's not part of the data table. It's a virtual variable with a formula, but not added to the data table. And now I can fit my model to the Log y. Let's remove age so it fits a little better. Whoops. Had that selected too. And now I've fit the response Log y. So if I profile that, what do I get? Factor profiling, profiler. Instead of profiling log of y, it profiles y. The profiler looks at that as a transform and says, well, I can invert that transform and go back to the original scale. And that's what it does. It untransforms, back-transforms through the log of y to take the prediction and put it on the original scale. And it will do that for most transforms that can unwind. If a transform involves multiple variables and so on, it can't do it and it will just do the transforms. So the original Same thing when I save a column. When I save the prediction formula, it saves a prediction formula on the scale of yrather than the log of y. And so these things are an incredibly time saving and saves a lot of effort in using transforms. So there's... Let me go to diabetes again and do another...consider another transform. When you're doing variable, there's a Box-Cox transformation that you can get. And among the factor profiling, the Box-Cox option tells you if you transformed a whole range of power functions, what, what would be the best to do? Should it be just untransform? That would be a Box-Cox lambda value of one. If you did zero, that would be equivalent to taking a log of it. If you did -1, that would be equivalent to taking the reciprocal of it. The power to the -1. If you did around .5, that would be equivalent to taking the square root of that. And it's telling you that this model would fit better on a transform scale adjusted for that transform, if it was more along the square root transformation, where lambda was .453. That's the optimal value in that the Box-Cox transformation. And you can zoom in on this with a magnifier to get to get it more precisely. So, So now I can transform, and several releases ago I can...I added several columns. One is refit with transform, which will make a new window with the transform response. And another is replacing transform and I'm going to do that. And rather than .453, I'm going to just take square root transformation (.5) and now I fit the model with that transform. And and now lambda best is around 1, which is where it should be, because it's already transformed at once by Box-Cox transformation. And now I can save the predicted value of that and profile it and so on. And I can even undo it. So if I don't like that transform and I want to go back to the original, I can go back to the original and refits. So, we've done a lot of pain saving in transforming responses. So now there's a special pain, a special place of pain when you have a lot of data. And we've gone to a lot of effort to try to solve big problems with less pain. Whether you have lots of rows, lots of columns, lots of groups, many models to try, in today's world we live in a world of big data with big problems. So if we have analyses that were originally designed to handle small problems, it may not be appropriate for large ones, and the central problem with big problems is that there's just too much output to sort through. If by fitting the same model to 1,000 variables, I have 1,000 reports to sort through. If I'm doing looking for outliers among 1,000 columns, there's just outliers, you know, separate reports for each column and so on. I want to be able to more efficiently get through a lot of large data sets. So we developed the screening menu, which is meant to solve these large kinds of problems. And plus, there are lots of places throughout JMP that solve large problems better as well. For example, time series. The new time series forecast platform will forecast many times series, instead of just one. So the items on the screening platform explore outliers, explorer missing values, explore patterns. These are for doing checks of data and then things looking for associations, response screening, process screening, predictors screening. And of course, the time series forecast is a new item. It's not in the screening menu, but it's organized for handling large problems. And all these things take advantage, not just of more compact ways to represent the results so you don't have to the thousands of reports, but they're also computationally efficient. They use multithreading so it takes advantage of your the multiple cores in your CPU to make it very fast. So let's say you got a lot of process data; you have 568 variables. So, which of these variables looks healthy? Well process screening is designed to answer that. And so it can sort by the stability or sort by the capability (Ppk) or which ones are bad off or sort by control chart measures, out of control accounts and so on. But what's even...I even like like better, are some of the tools that show all the processes in one graph. And there's two of them that I love. One is the the goal plot that shows how each process behaves with respect to the spec limits and if it's a capable processes it's in this green zone here. If it's marginal, it's in the yellow zone. If it's not, it's in the red zone, and if it's high up, it has too much variance. If it's to the one side of the other, then there's a problem. With... It's off center, it's off target. But if it's in the green zone then it's a good process, and with version 15 we introduce the graphlets, the hover help so you can see each process as you hover over it. And then the other plot that I love in in summarizing all these these things and reducing the pain of looking through all these reports is this process performance graph. So on the vertical axis, it tells you whether you're within spec limits by the capability Ppk. So if you're above this line at 1.33, then you're looking fine as far as the distribution of values with respect to the spec limits. You're well within the spec limits. If...then the stability index is on the x axis. So if it's a pretty unstable process even if might be capable but unstable. So if you look at that process, it might have some stability thing that wanders around some. And if you're in the yellow zone, you're capable but unstable and so on. The red zone is the bad zone where you're both incapable and unstable. And so looking through hundreds or thousands of processes is easy now, where it used to be a lot of pain. The question is, what changed the most? Here's some survey data where over many different years, you asked a question about their activities. Did you go camping? Or gamble in a casino? Or did you have a cold? You know, all these activities and you want to know which among all these activities (and there's I think, 95 different activities) which of these made the most difference. And so you're looking for, you know...one thing you could do is go through one at a time and fit that activity by year of survey. Instead of looking at one at a time. I can look at response screening and just look at one chart to see what changed the most. And so this is showing a chart for all 95 of those things. First, the significance in terms of the negative log of the P value, which we call the log worth, which is adjusted for the false discovery rate. And so it takes care of some of that selection bias because you're sorting all those those p values. and selecting the behind...the low P values. And I find that renting a videotape changed the most, video cassette tape, and of course they became obsolete. So, of course, it changed. Another video cassette tape changed a lot, collected things to recycling changed a lot, entertained people in my home changed a lot. And here's the one that's less significant, but big effect size, do you use the internet. And of course this survey was started before there was an internet. And so, it changed a lot. So the question on what changed the most was is easy now, where it used to be hard. Another question you asked about big data, where are the missing values. So here are 280 variables. Where are the missing values? Do I need to worry about them? And I can look at the explore missing values report and see that, well, there are only five of these variables with missing values. And some of them only have one variable, one missing value, but some of them have a lot. So when I do an analysis, a multivariate analysis, I probably don't want to include 376 out of 452 variables. Or if I want to include J, I can go through and impute those missing values by doing that, okay. So, Next question, does the data have outliers? Well, I have 387 measurements, I do in this process. And I want to find out if there are outliers in there. So now we have a facility to do it in the screening menu. I can make this more or less sensitive. Let's make it more sensitive so there's fewer outliers and rescan. I can look at the nines. So often, a string of nines is used to represent a missing value and those nines may be real nines, but they may be just an indicator of a missing value. And so I can, for those, I can say well add those nines to missing value codes and now the memory has changed. And now I can go back up (there's a lot of variables here) and rescan and there's fewer missing values to worry about. So exploring outliers used to be a pain. It still can be a big job, but it's a lot easier than it used to be. Now, in version 15, we added another screening platform. Does the data have suspect patterns? And so here's some lab data from clinical trial. This is nicardipine lab patterns data. There's 27 laboratory results that I may want to look at. And so I invoke the new platform, explore missing values. And this is going to show me, do I have a run of values? So I have the value .03536 but there's seven in a row, starting in row. 2065. I can colorize those cells, and I can look at those those values, and maybe it's the last value carried forward, which may not be suspicious. In this case, it's the... it's the same person. So maybe last value carried forward is a reasonable thing to do, but it is a rare, rare event if you've distributed these things... if, if you assume random distribution for these things. Also longest duplicated sequence. So for this variable and we're starting in row 2816 and also starting in row 3034, there are four in a row that had the same values. So if I colorize those things, there's four in a row there. And if I go to the next sequence, there's four in a row that have the same values. So that might be a symptom of cutting and pasting the same values from one place to another. So explore patterns is looking for those things. And there are many other things that you can look at the details on each of those 27 variables and look for symptoms of suspicious things or or bad effects of the way you processed the data and so on. So, explore patterns part of solving big problems. So, Pain Relief. Much of our development is focused on making the flow of analyses smoother, less burdensome, less time-consuming and less painful. Analgesics. Now we don't always know what's painful. So we depend on feedback for what to focus on. When we get those emails that I had to respond to a prompt hundreds of times, we listened to that and we feel the pain and we fixed it so that now you can broadcast into a by group with thousands of things and broadcast the results of one dialogue rather than dialoguing many times. So sending it in made it better for everyone else, because we didn't catch it the first time around. So please give us your feedback. With all the improvement we've already made, we think the process of data preparation and analysis has already become much smoother, much less interrupted, more in,flow. So, instead of spending your time getting over obstacles, you spend your time learning from your data, understanding your data. analytics. One would wonder if they came from the same root, of course, we don't like to abbreviate those two words. Analytics comes from the Greek analyein, and I don't know how to pronounce that. But in Greek, it means to break up, a release, or set free. And it's taking something complicated and breaking it up into pieces so we can understand it. And that's, of course, exactly what we do with analyzing data, data science. And analgesics comes from a different combination of words \!"an-\!" meaning without, and \!"algesis,\!" which is the sense of pain. So same prefix, different...a little bit different roots. And don't abbreviate those. anabolic and aphrodisiac. We care about power. Much of what we do is to give you more powerful tools for analyzing data, much of that in JMP Pro, as well as in JMP. And we hope that data makes it exciting, you know, the thrill of discovery. The thrill of learning how to use power features in JMP and we think it's exciting. You know, it's an aphrodisiac. So power and excitement are also value to us. It's not just pain relief. So what are we going to do next year? Well, big words starting with the letter B. Start with A, next is B, right? Well, next year we have JMP 16 coming. And so that's what we're going to talk about. Who knows what we're going to talk about the year after. So thank you much. And we hope you suffer very little pain in analyzing your data. Jeff Perkinson Thank you very much, John. We appreciate that. It was a fantastic talk and having been around to witness a lot of the pain over the years, it is it is everything you say is absolutely true. Pain relief is an important thing for us. We did have one question that came in, actually, a number of questions; we've answered some of them in the Q&A, but what I wanted to throw to you it both as relief pain and provided some attraction and made JMP more powerful? John Sall Well, I think everyone's big delight is Graph Builder. It feels incredibly powerful to just drag those variables over and do a few other clicks and you have the graph that you want, and you can change it so easily. So it's, it's a thrill. It's a powerful feature and it's pain relief; it used to be harder to do. So I think that's everyone's favorite thing. But of course there's...JMP is a rich product and we're proud of everything in it. Design of experiments, all the great power involved in there and we've tried to make that process easy as well. And so many things come to mind. Jeff Perkinson Very good. Thank you very much, John, I appreciate it. If you have enjoyed this talk, I have two suggestions for you. One, we will be posting this

0 attendees

0

Event has ended

0 attendees

0

Monday, October 12, 2020

0 attendees

0

Event has ended

0 attendees

0

Monday, October 12, 2020

The purpose of this poster presentation is to display COVID-19 morbidity and mortality data available on-line from Our World in Data whose contributors ask the key question: “How many tests to find one COVID-19 case?” We use SAS JMP Analyze to help answer the question. Smoothing test data from Our World in Data, yields seven-day moving average or SMA(7) total tests per thousand in five countries for which coronavirus test data are reported: Belgium, Italy, South Korea, the United Kingdom and United States. Similarly, seven-day moving average or SMA(7) total cases per million of were derived using the Time Series Smoothing option. Coronavirus tests per case were calculated by dividing smoothed total tests by smoothed total cases and multiplying by a factor of 1,000. These ratios of smoothed tests to smoothed cases were themselves smoothed. Additionally, Box-Jenkins ARIMA(1,1,1) time series models were fitted to smoothed total deaths per million to graphically compare smoothed case-fatality rates with smoothed tests per case ratios. (view in My Videos) Auto generated transcript: Auto-generated transcript... Speaker Transcript Douglas Okamoto In our poster presentation we display COVID-19 data available from our world and data, who's database sponsors, ask the question why is data on testing important We use JMP version. To help us answer the question. Seven Day moving averages are calculated from January 21 to July 21 Daily per capita COVID-19 tests and coronavirus tests in seven countries United States, Italy, Spain, Germany, Great Britain, Belgium and South Korea. Core by owners test per case where calculated by dividing smooth test by smooth cases and multiplying by a factor 1000 Daily COVID-19 test data yields smoothed test data per thousand in Figure one Testing in LA states in blue trims upward with two tests per thousand daily on July 21st 10 times more than South Korea in red. Which trends downward The x axis is normalized the figure one, two days since moving averages number one or more tests per thousand. In figure two smooth coronavirus cases per million in Europe and South Korea trend downward after peaking months earlier than the US in blue, which averaged 2200 cases per month million on July 21st, with no end in sight. The x axis is normalized to the number of days since moving averages of 10 or more cases per million. Combining tabular results from figure one and figure to smooth COVID-19 test per case in Figure three shows South Korean testing in red peaks at 685 tests per case in May 38 times USP performance in lieu Of 22 tests per case in June. Since the x axis is dated figure three represents a time series. The reciprocal of tests for case cases protest is a measure of product to a positivity one in 22 or 4.5% of positive cases in the US compares with 0.15% positivity in South Korea. And 0.5 to 1.0% in Europe. At a March 30 who press briefing. Dr. Michael Ryan suggested a positive rate less than 10% or even better, less than 3% as a general benchmark of adequate testing. JMP analysis JMP analyzed was used to fit Box Jenkins time series models to smooth test per case in the US for March 13 of April 25 predictive values from April 26 two main ninth or forecast from a fitted model and auto-regressive integrated moving average or ARIMA 111 Model the figure for a time surge of smooth tests per case from mid March to April shows a rise in the number of us test for case not a decline as predicted during the 14 day forecast period. In summary, 10 or more test cases tests were performed per case to provide adequate testing in the United States COVID-19 testing in Europe and South Korea was more than adequate with hundreds of tests per case. Equivalent only the positive rate or number of cases protest was less than 10% in the US. Whereas positivity in Europe and South Korea was well under 3% When our poster was submitted the US totaled 4 million coronavirus cases more than your European countries and South Korea combined Us continues to be plagued by state by state disease outbreaks. Thank you.

0 attendees

0

Event has ended

0 attendees

0

Monday, October 12, 2020

Pranjal Taskar, Formulation Scientist II, Thermo Fisher Scientific Brian Greco, Formulation Scientist I, Thermo Fisher Scientific Sabrina Zojwala, Formulation Scientist I, Thermo Fisher Scientific Kat Brookhart, Manager, Formulation & Process Development, Thermo Fisher Scientific Sanjay Konagurthu, Sr. Director, Science and Innovation, Drug Product NA Division Support, Thermo Fisher Scientific Pharmaceutical tableting is a process in which an active moiety is blended with inert excipients to achieve a compressible mixture. This mixture is consolidated into the final dosage form: a tablet. The process of tableting considers different composition-related and process variables impacting quality attributes of the final product. This work focuses on using JMP software to identify main effects. An I-optimal, 19-run custom design was outlined with the factors being type and ratio of filler used (microcrystalline cellulose, mannitol vs lactose, categorical), percentage active spray dried dispersion loading (continuous), order and amount of addition (intragranular vs. extragranular, continuous), and ribbon solid fraction (continuous). The responses were outlined as bulk density, Hausner ratio, percentage fines, blend compressibility and tablet disintegration. The model evaluated with the main effects and second degree interaction terms. The data was evaluated using Standard Least Squares in the Fit Model function. Results determined that lactose provided the blend with a higher initial bulk density, however mannitol maintained bulk density post compression. Microcrystalline cellulose improved flow properties of the blend and high percentage intragranular addition provided material with higher bulk density and improved material flow. Auto-generated transcript... Speaker Transcript Pranjal Taskar All right. Thank you, Peter. So I'm going to get started now. Hello everyone. Today I'm going to talk about my poster. This poster is regarding systematic analysis of targeting, which includes effect of formulation and process variables on final quality attributes of my product. So delving into all the statistical analysis before that, I wanted to give a background about what exactly we're talking about. What is tableting? Tableting is a pharmaceutical process. Looking over in the introduction, I'm going to talk about what tableting is a little bit. It's a pharmaceutical process in which your active ingredient or active moiety (API) is blended with other excipients to form a free flowing good flowing blend and this blend is compressed into our final dosage form, which is a tablet. So in a lot of situations, there are some active moieties or APIs, as we would call it, that have a low bioavailability and that could be due to their crystalline nature. They're just too stable, too rigid in their ways. So our site kind of specializes into making this crystalline API, a little bit more soluble, little bit more reactive amorphous form and it makes it into like a more bioavailable form. And when we do that, we fortified this API by a polymer. This this intermediate that we form is a tablet intermediate called a spray dried intermediate, SDI. And this is what we basically use in our tablets as our active intermediate. But when you look at it, it has poor flow ability and it's extremely fluffy. So when you have to incorporate this API into your tablet, you need to have other pharmaceutical processes involved to make it more streamlined, to make the blend more flowable. So this is what we're going to do. In this study, we are going to identify our critical quality attributes, the variables that matter, or our dependent variables and then we are going to identify variables that impact our critical quality attributes, which are the composition of that tablet of that blend and then different process processing parameters that we used in us in tableting. Which of these are main effects? Are there any interactions? And then we'll use JMP to identify all of these main effect and interaction variables and try to catch out the tableting process basically. So this was the introduction. Moving on to the methods and objectives. So how do we do this? For this study we looked at a placebo formulation. There is no active product or actor moiety and we used a commonly used spray-dried polymer which is hypromellose acetate succinate. We spray dried it and made it into the fluffy blend that it usually is. And Figure 2 talks about our usual granulation tabulating process. So, what, what we do is basically have our spray-dried intermediate (SDI) blended along with other excipients using this blender. We move on to roller compaction, which is densification of this blend using these there are rollers right here and these rollers move slowly to densify the blend which goes into this hopper and you get ribbons out of the roller compactor. Now what you have done is you have made that fluffy material into densified ribbons and you mill it down using a comil. And you get granules. These granules are more dense and they are a lot better flowing than your API or your SDI. So looking at this entire process, there are a lot of variables that go in there that you need to change and look out for. So what are those variables? This diagram over here will identify different kinds of variables, the independent variable variables that go into the formulation and process. so The first variable would be a bit more base formulation related than the...rather than the process related. So it would talk about different types of ??? excipients that are used. And the ratio of these excipients that I used the percent of SDI loading, or active loading, and in our case, the placebo loading. And then the order of addition and the point of addition at where the SDI, or other excipients are loaded into the formulation. And then sorting process related parameters such as ribbon solid fraction, which basically talks about this equipment, the roller compactor and the speed at which the rollers and the spools move. We have also identified independent variables of our critical quality attributes that we look out for, which is bulk density of our blend, Which we look at before and after granulation and you have labeled it bulk density 1 and 2. Hausner ratio, which is again a ratio that depicts the flow of your blend and we also identify that before and after granulation, labeled as Hausner ratio 1 and 2. And the percent of fines that collect...are collected in the roller compaction process. And this is usually monitored after granulation. So all of these points out to talk about basically our method and why we chose our variables. What we did was we had an I optimal, 19-run custom design looking at all of these independent variables impacting on the dependent variables. And the way we analyze this model or the way we constructed effects, was that we looked at the main effect and the second degree interactions and we analyzed the data using the standard least squares personality in the fit model function. So, Identifying the process and the objectives, we will move on to results, but before doing that really quickly, I wanted to look at the JMP window which I have pulled up right now. These different columns are my independent and dependent variables and I'm going to highlight right here, these are the different independent variables that we are going to be looking at. So type of filler, which is the type of inert excipient and we have looked at mannitol and lactose. percent SDI, which is the active or in our case placebo loading, looking at highs and lows away here; and amount intragranular, so the amount of our excipients that we add before the roller compaction versus after the roller compaction and outline here are 75 and 95; and mannitol and lactose, which is a filler to MCC, which is micro course design cellulose ratio. Mannitol lactose are, I would say a little bit more excipient and MCC is more ???, gives more strength to the blend. So we have looked at a ratio of this to see how it impacts our tableting blend overall. And on the right are our responses. Bulk density 1, Hausner ratio 1, which is before granulation. Bulk density 2 and Hausner ratio 2, which is after granulation, and percent fines. So I'm gonna go over here quickly into this window and look at how we created our model, our response variables y, that I just talked about. And then our model effects which are secondary interactions and main effects. Standard least squares. That's what we used and I run the model. This is my effect summary right here and based on this data that we're looking at and prior experience, I'm going to take off the last two effects. Just remove that extra noise and then over here, I have my responses and how the data kind of impacts these responses. It would be just easier if we go down and look at the prediction profiler over here. And how all of these dependent variables are impacted by this. So I think it might just be easier if I pull up my poster and... Alright, so looking at the results over here, what we found out from Figure 3 was that, look, the two fillers lactose had higher bulk density initially, but post ruler compaction, the bulk density two of these fillers dropped and you can see a corresponding increase in the fines. So what we think would have happened is that lactose is more brittle in comparison to mannitol. And this generated all of that attrition and that fines and that impacted the flow, making it less bulky, drop in the bulk density. And the Hausner ratio, a little bit higher with the lactose. So basically, what we're doing is targeting a higher bulk density and we want a lower Hausner because a lower Hausner indicates a better flowing blend. So looking at the data, mannitol had a slight edge over lactose as a filler. And the, the second point would be talking about the solid fraction and overall we saw that there was a slight plateauing effect at around .6 solid fraction. Overall, we see that .7 has the least number of fines, which is why we see a recommended .7 with a maximum and desirability, but the plateau effect in terms of your flow properties (bulk and Hausner) start bottoming out at around .6 and onwards. that having lower SDI in general in the formulation had overall better flow properties. Just because the SDI, it's fluffy and it causes the blend to flow a lot worse. So the design just suggested us to have lower SDI loading. a higher amount of that ingredient of that excipient added in an intragranular fashion than an extragranular, just because it improves your bulk, it has a lower Hausner which means that your blend is flowing smoother. We also observed that mannitol to lactose ratio having more of that critical component was more desirable and I see that because overall, the fines have dropped in the presence of having a little bit more of the mannitol lactose component. And that could be the reason why we are seeing this. We also have in the Figure 4, a couple of surface plots of a few interesting trends that I saw. And in Figure 4A, you can see that having a lower SDI loading and having more amount intragranularly resulted in this hotspot right here of a very high Hausner ratio. So when you add a lot of...when you have a low....I'm sorry...have a higher SDI and higher intragranular had an extremely high Hausner ratio. So what this says is basically when you have more of that fluffy material intragranularly, your flow is going to be bad, but you correspond that after granulation, when you again have more more of your excipient intragranularly and you're targeting a solid fraction of about .6 and about, your bulk density improves. So you're basically post granulation, your blend is getting more denser and this is what these two diagrams talk about. So all of the result points basically talk about these things that I discussed right now. Overall, we conclude from our study that in order to optimize this process and maximize desirability for formulations, 1, a higher ratio intragranularly and a lower SDI loading would be a preferable formulation and targeting a solid fraction of around 0.6 would also be beneficial to the formulation. Thank you very much. I would welcome your questions.

0 attendees

0

Event has ended

0 attendees

0

Monday, October 12, 2020

Carlos Ortega, Project Leader, Avantium Daria Otyuskaya, Project Leader, Avantium Hendrik Dathe, Services Director, Avantium Creativity is at the center of any research and development program. Whether it is a fundamental research topic or the development of new applications, the basis of solid research rests on robust data that you can trust. Within Avantium, we focus on executing tailored catalysis R&D projects, which vary from customer to customer. This requires a flexible solution to judge the large amount of data that is obtained in our up to 64 reactor high-throughput catalyst testing equipment. We use JMP and JLS scripts to improve the data workflow and its integration. In any given project, the data is generated in different sources, including our proprietary catalyst testing equipment — Flowrence ® —, on-line and off-line analytical equipment (e.g., GC, S&N analyzers and SimDis) or manual data records (e.g., MS Excel files). The data from these sources are automatically checked by our JSL scripts and with the statistical methods available in JMP we are able to calculate key performance parameters, elaborate key performance plots and generate automatic reports that can be shared directly with the clients. The use of scripts guarantees that the data handling process is consistent, as every data set in a given project is treated the same way. This provides seamless integration of results and reports, which are ready-to-share on a software platform known to our customers. Auto-generated transcript... Speaker Transcript Carlos Ortega Yeah. Hi, and welcome to our presentation at the JMP Discovery Summit. Of course, we would have liked to give this presentation in person, but under the current circumstances, this is the best way we can still share the way we are using JMP in our day-to-day work and how it helps us actually on the more day-to-day work, how to rework our data. However, the presentation in this way with the video has also an advantage for you as a viewer because yeah if you want to grab a coffee right now you just can hit pause and continue when the coffee is ready. But looking at the time, I guess the summit is right now well under way. And most likely, you heard already quite some exciting presentations. How JMP can help you to make more sense out of the data to solve them a statistical tools to gain deeper insight and dive into more parts of your data. However, what we want to do today (and this is also hidden under the title about the data quality assurance), the scripting engine. Everything, which has to do with JSL scripting because we help...this helps us a lot on our day-to-day work to prepare the data, which are then ready to be used for data analysis and by we I mean Carlos Ortega, Daria Otyuskaya, and myself, which I now want to introduce a bit because, yeah, that's the to get a bit better feeling on who's doing this. But of course, as usual, there are some some rules to this, which are the disclaimer about the data we are using. And now if you're a lawyer for sure you're going to press pause to study this in detail. However, for all other people right now, let's dive into the presentation. And of course nothing better than to start with a short introduction of the people you see, you see already the location. We all have in common, which is Amsterdam in the Netherlands and we all have in common that we work at Avantium. company provider for sustainable technologies. However, the different locations we are coming from is all over the world. We have, on the one hand side on the left side, Carlos Ortega, a chemical engineer from Venezuela, which lives in Holland, about six years and works at Avantium about two years as a project leader and services. Then we have on the right side Daria Otyuskaya from Russia also working here for about two years and spending the last five years in the Benelux area where she made her PhD in chemical engineering. And myself. I have the only advantage, can that I can travel home by car as I origin from Germany. I live in Holland since about 10 years and join Avantium about three years ago. But now, let's talk a bit more about Avantium. I just want to briefly lay out a bit of the things we are doing. Avantium, as I mentioned before, provider for sustainable technologies and has three business units. One is Avantium Renewable Polymers, where we actually develop biodegradable polymer called a PEF, which is hundred percent plant based and recyclable. Second, we have a business unit called Avantium Renewable Chemistries, which offers renewable technologies to produce chemicals like MEG or industrial sugars from non food biomass. And last but not least, a very exciting technologies, where we turn CO2 from the air into chemicals via electro chemistry. But not too much to talk about these two business units because Carlos, myself and Daria are all working in the Avantium Catalysis, which was founded in 20 years ago and it's still the founding...the fundamental of Avantiums technology innovations. We are actually providing their We are a service provider in accelerating the research in your company in the catalysts research, to be more more specific. And we offer there, as you can see on the right hand side, systems services and a service called refinery catalyst testing. And what we help companies really to develop the R&D, as you see at the bottom for this. But this is enough about Avantium. Let's talk a bit how we are developing how we are working in projects and how JMP actually can help us there to accelerate the stuff and get better data out of it, which Carlos then later on the show in a demo for us. As mentioned before, we are a service provider and as a service provider, we get a lot of requests from customers to actually develop better catalysts, or better process. And now you might ask yourself, what's the catalyst. A catalyst is actually a material which participates in a reaction when you transform A to A, but doesn't get consumed in a reaction. The most common example of people, which you can see in your day-to-day life is, for example, the exhaust gas catalyst which is installed in your car, which turns off gases from your ...from your car into CO2 and water as an exhaust. And this is things which we get as requests. People come to us and say, "Oh, I would like to develop a new material," or things like, "I have this process, and I want to come with...accelerate my research and Develop a new process for this." And what they use there is when we have an experiment in our team, we are designing experiments of... designing experiments. We are trying to optimize the testing for this and is all we use JMP, but this is not what we want to talk today about. Because as I said before, we are using JMP also to actually merge our data, process them and make them ready for things, which is the two parts, which you see at the bottom of the presentation. We are executing research projects for customer in our proprietary tool called Flowrence, where the trick is that we don't experiment...don't execute tests, one after another, but we execute in parallel. Traditionally, I mean, I remember myself in my PhD, you execute a test one reactor after another, after another, after another. But we are applying up to 64 reactors in parallel, which makes the execute more challenging but allows a data-driven decision. It allows actually to make more reliable data and make them statistically significant. And then we are reporting this data to our customers, which then can either to continue in their tools with their further insights or completely actually rely on us for executing this data and extracting the knowledge. But yeah, enough about the company. And now let me hand over to Carlos, which will explain how JMP and JMP script actually helps us to make us our life significantly easier. Thank you, Hendrik,for the nice introduction. And thank you also for the organizers for this nice opportunity to participate in the JMP discovery summit. So as Hendrik was mentioning, we develop and execute research projects for third parties. And if we think about it, we need to go from design of experiments (and that's of course one very powerful feature from JMP), but also we need to manage information and in this case, as Hendrik was was mentioning, we want to focus on JSL script that allows us to easily handle information and create seamless integration of a process workflows. I'm a project leader in the R&D department and so a day...a regular day in my life here would look something like this. And so very simplistic view. You would have clients who are interested and have a research question and I design experiments and we execute these in our own proprietary technology called Flowrence. So in a simple view the data generated in the Flowrence unit will go through me after some checks and interpretation will goes back to the client. But the reality is somewhat more complex and on one hand, we also have internal customers. That is part of...for example our development team...business development team. And on the other side, we also have our own staff that actually interacts directly with the unit. So they control how the unit operates and monitor everything goes according to the plan. And the data, as you see here with broken lines, the data cannot be struck directly from the unit. The data is actually sent to a data warehouse and then we need a set of tools that allows us to first retrieve information, merge information that comes from different sources, execute a set of tasks that go from cleaning, processing, visualizing information, and eventually we export that data to the client so that the client can get the information that they actually need and that is most relevant for them. If you'll allow me to focus for one second on these different tasks, what we observed initially in the retrieve a merge is that data can actually come from various sources. So in the data warehouse, we actually collect data from the Florence unit, but we also collect data from the analyzer. So for those that they're performing tests in a laboratory, you might be familiar with the mass spectrometry or gas chromatography, for example, and we also collect data on the unit performance. So we also verify that the unit is is behaving as expected. In...as in any laboratory, we would also have manual inputs. And these could be, for example, information on the catalysts that we are testing or calibration of the analytical equipment. Those manual inputs are always of course stored in a laboratory notebook, but also we include that information into an Excel file. And this is where JMP is actually helping us drive the work flow of information to the next level. So what we have developed is a combination of an easy to use vastly known Excel file with powerful features from a JSL script. And not only we include manual data that is available in laboratory notebooks, but we also include in this Excel file formulas that are then interpreted by the JSL script and executed. That allows us to calculate key performance parameters that are tailored or specifically suited for different clients. If we look in more detail into the JSL script, and in a moment I will go into a demo, you will observe that the JSL script has three main sections. One section will prepare the local environment. So on one side we would say we want to clear all the symbols and close tables, but probably the most important feature is when we define "names default to here." So that would allow us actually to run parallel scrapes without having any interference between variables that are named the same in different scripts. Then we have section that is collapsed in this case so that we can show it actually that creates a graphical user interface. And then the user does not interact with the script itself, but actually works through a simple graphical user interface with the buttons that have descriptive button names. And then we have a set of tasks that are already coded in the script. In this case, they are in the form of expressions. Because well, it has two main advantages. One would be a it's easy to later on implement on the graphical user interface. And second, when you have an expression, you can use this expression several times in your code. OK, so moving on into the demo simulation. So I mentioned earlier that we have different sources of data. And on one side we have data that is in fact... that is in fact stored in our database. And this database will contain probably different sources of information, like the unit or different analyzers. In this case, you will see or you see an example Excel table. This only for illustration. So this data is actually taken from the data warehouse directly with our JSL script. So we don't look at this Excel table as a search. We let the software collect the information from the data warehouse. And probably what is most important is that this data, as you see here, can come again from different analyzers, and we're structuring somehow that the first column contains divided names. In this case, we have made some domain names. So, for reasons of confidentiality, but also you will see that all the observations are arranged in rows. So every single row is an observation. And depending on the type of test and the unit we are using, we could think that overall in one day we can collect up to half a million data points in one single day. That depends of course on the analyzer, but you immediately are faced with the amount of data that you have to handle and how JSL script that helps you process information can help you with this activity. Then we also use another Excel file. And this one is also very important, which is an input table file. And this files, specifically with the JSL script, are the ones creating the synergy to allows us to process data easy. What you see in this case, for example, is a reactor loading table and we see different reactors with different catalysts. And this information that seems... is not quantitative, but the qualitive the value is important. And then if we move to a second tab, and these steps are all predefined across our projects, we see the response factors for the analyzers. Different analyzers will have different response factors and it's important to log this information into use through the calculations to be able to get quantitative results. In this case, we observed that the condition that the response factors are targeted by condition instead. Then we have a formula tab. And this is probably a key tab for our script. You can input formulas in this Excel file. You make sure that the variable names are enclosed into square brackets. And the formula, you can use any formula in Excel. Anyone can use Excel; we're very much used to it. So if you type a formula here, that follows ??? syntax in Excel, it will be executed by our JSL script. Then we also included an additional feature we thought it was interesting to have conditionals. And for the JSL script to read this conditional, the only requirement is that the conditionals are enclosed in braces. There are two other tabs I would like to show you, which are highly relevant. One is a export tables tab and the reason that we have this table is because we generate many columns or many variables from my unit, probably 500 variables. But actually the client is only interested in 10, 20 or 50 of them. Those are the ones that really add value to their research. So we can input those variables here and send it to the client. And last but not least, I think many of us have been in that situation where we send an email to a wrong address and that can be actually something frightening when you're talking about confidential information. So we always double, triple check the email addresses and but does it...is it really necessary? So what we are doing here is that we have one Excel file that contains all manual inputs, including the email address of our clients. And these email addresses are fixed so there is no room for error. Whenever you have run the JSL script the right email addresses will be read and the email will be created and these we will see in one minute. So now going into the JSL script, I would like to highlight the following. So the JSL script is initially located in one single file in one single folder and the JSL script only needs one Excel file to write that contains different tabs that we just saw in the previous slide Once you open the JSL script, you can click on the run script button and that will open the graphical user interface that you see on the right. Here we have different options. In this case we want to highlight the option where we retrieve data from a project in that given period. We have selected here only one day this year, in particular, and then we see different buttons that allows us to create updates, for example. Once we have clicked on this button, you will see to the left on the folder that two directories were created. The fact that we create these directories automatically help us to have harmony or to standardize how is a folder structured also across our projects. If you look into the raw database data, you will see the two files were created. One contains the raw data that comes directly from the data warehouse. And the second, the data table contains all merge information from the Excel file and different tables that are available in the data warehouse. The exported files folder does not contain anything at this moment, because we have not evaluated and assessed the data that we created in our unit is actually relevant and valuable for the client. We do this, we are, we ??? and you see here that we have created a plot of reactor temperature versus the local time. And different reactors would be plotted so we have up to 64 in one of our units. And in this case we color the reactors, depending on the location on the unit. Another tab we have here, as an example, is about the pressure. And you see that you can also script maximum target and minimum values and define, for example, alerts to see if value is drifting away. The last table I want to show is a conversion and we see here different conversions collapsed by catalyst. So once we click the export button, we will see that our file is attached into an email and the email already contains the addresses...the email addresses we want to use. And again, I want to highlight how important it is to send the information to the right person. Now this data set is actually located into the exported files folder, which was not there before. And we always can keep track of what information has been exported and sent to the client. With this email then it's only a matter of filling in the information. So in this case, it's a very simple test. So this is your data set, but of course we would give some interpretation or gave maybe some advice to the client on how to continue the tests. And of course, once you have covered all these steps you will close the graphical user interface and that will also close all open tables and the JSL script. Something that I would like to highlight at this point is that these workflow using a JSL script is is rather fast. So what you saw at this moment, of course, it's a bit accelerated because it's only a demonstration, but you don't spend time looking for data and different sources, trying to merge them with the right columns. All these processes are integrated into a single script and that allows us to report to the client on a daily basis amounts of data that otherwise would be would...would not be possible. And the client can actually take data driven decisions with a very fast pace. That's probably the key message that I want to deliver with with this script that we see at this moment. Now, well, I would like to wrap up the presentation with with some concluding remarks and some closing remarks. And so on one side, we developed a distinctive approach for data handling and processing. And when we say distinctive it's because we have created a synergy between an Excel file that most people can use because you are very familiar with Microsoft Office and a JSL script which doesn't need any effort to run. So you click Run, you will get a graphical user interface and a few buttons to execute tasks. Then we have a standardized workflow. And that's also highly relevant when you work with multiple clients and also also from a practical point of view. For example, if one of my colleagues would go on holiday, it will be easy for another project leader for myself, for example, to take over the project and know that all the folder structures are the same, that all the scripts are the same and the buttons execute the same actions. Finally, we can also...we can guarantee seamless integration of data and these fast updates of information with thousands or even half a million data points per day can be quickly sent to clients and then this allows them to take almost online data driven decisions. At the end, our purpose is to maximize the customer satisfaction through a consistent, reliable and robust process. Well, with this, I would like to thank, again, the organizers of these discovery summit. Of course, to all our colleagues at Avantium, who have made this possible, especially to those that have worked intensively on the development of these scripts. And if you are curious about our company or the work we do in Catalysis, please visit one of the links you see here. And with this, I'd like to conclude, thank you very much for for your attention. And yeah, we look forward to your questions.

0 attendees

0

Event has ended

0 attendees

0

Monday, October 12, 2020

Martin Kane, Managing Scientist, Exponent Analytical methods for pharmaceutical development often require the use of dose-response curves and the fitting of an appropriate statistical model. Common functions used are the Rodbard and Hill function: different parameterizations of four parameter logistic functions. This presentation will discuss using JMP and the JMP Scripting Language to fit these non-linear functions, even when then are ill-behaved. Real-world data will be used to demonstrate how to use JMP’s various non-linear fitting routines and possible methods of dealing with messy data. Auto-generated transcript... Speaker Transcript mkane Okay. Hi everybody, my name is Martin Kane. I'm with Exponent and I'm here to talk about dose-response curve fitting for ill-behaved data here at the 2020 JMP Discovery Summit. First of all, I'd like to thank the conference advisory committee for inviting me to give this talk. And I really appreciate the opportunity and a chance to share the learnings that I've done through JMP. I use JMP all the time every day. And as a consultant, it becomes my primary tool for performing analysis. So this is something that I have been working on recently and thought it would be a good thing to share. So let's get into this. First of all, there's a disclaimer. The ideas in these slides belong to me, Martin Kane, do not necessarily represent those of my company Exponent. So with that being said, what are we going to talk about? So what are dose-response models? What shape do they often follow? Typical statistical models for those. How do we access these models in JMP? Difference between curve fit and nonlinear. What are the benefits of each? And what are the drawbacks of each? That's an area I will spend some time on. And we'll talk about initial values as well and the importance of having good solid initial values for these nonlinear models. I have a demonstration and I will then use the data in that demonstration to look at it ill-behaved data. What does that mean, and what can we do about it using the curve fit and nonlinear platforms in JMP? Okay. Dose-response models. So they can come in both linear and nonlinear formats. Typical models, though, are based on what we call the 3, 4 or 5 parameter logistic models. Those are very typical and there are many of them; there aren't just three. So what are the shapes of some of these models? Obviously linear, linear, excuse me, is a straight line, just a standard regression where we've plotted some sort of response concentration often versus...our log concentrations on the x axis versus the y axis, which is some sort of a response. And I will talk about it in just the next slide, but oftentimes this is based on some sort of fluorescence. And those values can be quite large, in terms of their range. So it's not uncommon to take the log values of those as well. We can have the exponential type model. This might work in some portion of the dose-response curve, but often is not sufficient for the entire curve. But more common than not is some form of... some sort of parameter logistic, and this example shows the four parameter logistic, the Hill function. This can also be called in a slightly different orientation, the Rodbard function. There's several different versions that have this same general shape, where we have some sort of upper asymptote, some sort of lower asymptote. There is a center point along this curve, somewhere halfway between the upper and lower asymptote, which we call the EC50 or IC50 value. That point is on the x axis, is how we use that. And then we also have a slope to this curve and this slope is here in this linear section. It's not truly linear, but it's close to linear. And the slope is the fourth parameter and down the bottom, that's this a parameter in this in this equation. d is our upper asymptote, that's the top; c is the lower asymptote at the bottom; I mentioned a was the slope; and b is the IC50 or EC50 value, that's the halfway point and the x axis concentration for that. So this is the equation of a typical 4 parameter logistic function. Okay. So, in the world of biologics and pharmaceuticals, a lot of the standard test method format is based on what's called an assay. And assays themselves are nothing more than a test method for biologics, for pharmaceuticals. In this particular case what we see here is a standard 96-well plate that's used for these types of assays. Each of these little circles actually represents a well, actually a little divot in a plastic plate where materials can be put. And so we can fill up all 96 wells, which will have some sort of binding agent on the bottom of the wells and some sort of fluorescence material in it as well. And so, once these are put under a certain wavelength of light, they will fluoresce. And depending on how much binding takes place, you'll get different fluorescence values. And like I said those fluorescence values can go anywhere from 10 to 100,000 or maybe even a million. It just depends on the format. It can be quite large, though. Typically on a plate, though, we will put seven or eight different concentrations of a curve, and the curve would be, as we showed in the previous slide, representative of one single material with various amounts for the concentrations. Typically we start at the top of the plate and we put the highest concentration and we might serially dilute that concentration down to the wells below it. So if the top starts out at say value of 16, we might do a 4-to-1 serial solution. And so we end up with four in the next column, one in the next row, excuse me, one-fourth, one-16th, and so on down the line. So we end up with serially diluted material going down the plate. Oftentimes there are duplicates, so columns five and six might have the same material, just in there twice, and that's so that we can get some form of variability in our curve. Oftentimes as well, when we're running an assay for a biologic or pharmaceutical, we will test multiple doses at the same time. So the point there is that on the same plate, at the same time, when we have various doses that we're trying to compare to one another. So for instance, columns 11 and 12 might have one dose, call it at one milligram per kilogram, and columns 9 and 10 might have a different dose at say, .1 milligrams per kilogram. And we might be wanting to compare those different doses at the same time. The other thing to mention is JMP has the ability to test for what's called parallelism. I'll talk about that. And built in, there are functions for testing parallelism using either the F test for the chi square method. Okay, so let's take a look at some data in JMP and get right into it. So here we go. Right over here on the left I have a JMP journal that I'm going to use for this demo. And this is for nonlinear bio assay materials and for ill-behaved data. The two platforms, as I mentioned earlier, that I will be discussing are the curve fit and the nonlinear platforms. So let's open up our sample data. Let's pull up some sample data. Now I initially had wanted to use an actual data from a client, they declined to let me do that. But the data that JMP has built in, in the bio assay sample data set, works just fine. It's very similar to what I would have used and we can use that. So first of all, let's take a look and see what we have. We have some sort of concentration, as I mentioned, the serial dilution. In this particular case, it looks like each row down is three-fourths of the row above it. There's some sort of log concentration. That's just log 10 of the concentration. Formulation looks like it has various formulations, or those could just as easily be doses. And toxicity, that may be the y value, our response could be fluorescence or log or fluorescence, something like that. So if we take a look at this data using Graph Builder, just because it's easy, we put toxicity in the y axis and maybe we put the log concentration on the x axis. We can see that there is a similar looking function to that 4 parameter logistic that I mentioned earlier, except it's reversed in terms of its direction. That's not a problem. The cubic spline that's used fits the data quite well. We can remove that and just look at the data. Now, obviously there's a lot of data here. Looks like there's four values per concentration. Oh, there was a formulation that we haven't talked about yet. And that, we could take that and we could do various things, right, in Graph Builder. We can put that in group y, and we can get four different curves out of this. Let's use cubic spline. standard, Test A, Test B, and Test C. Now you can change the colors. Those are harder to see because we have a single curve fit. We could also just take it and put it in the overlay area, which is the most common area that I typically use, and what this does is this actually fits an individual cubic spline for each of my various formulations. That's kind of nice. And we can see that three of them are quite similar, except for one is different. The green one. The green one is Test B. Okay, so let's remember that, Test B is different than the others. And standard is just one of the various formulations that are being looked at. So it looks like they're trying to compare three different tests against the standard, which is interesting. Okay, so I'm going to close this down. Now what sort of curve fit functions do we have for these nonlinears? So under analyze, specialized modeling, you have two that we can use. One is called curve fit and one is called nonlinear. Both of these will work for nonlinear data. And let's start with the curve fit function. So in the curve fit, we want to put some sort of y response toxicity in our y, and log concentration for our regressor. And initially, if I just say okay, what we get looks just like what we had in Graph Builder. There's one exception here and that exception is I can come up here under the red triangle linear, quadratic, cubic and so on. Sigmoidals logistics, probits, Gompertz. It doesn't tell you what the functions really look like or what their equations are. You have to know the right one. Well, I happen to know that I want the Hill function, and that's hidden here in the sigmoid curves, logistic curves, and here it is, fit logistic 4 parameter Hill. That's the function that I would like to use. So I can click on that and I get what looks a lot like what I had in Graph Builder, except now I actually have parameter estimates down at the bottom. Remember, we had a lower asymptote; an upper asymptote; a growth rate, which is the slope; and the inflection point, that's the EC50 value, that's the point halfway between the top and the bottom on the x axis. And those are the estimates. This is nice, but I really want separate graphs for each of my different formulations, so I'm going to redo this. I'm going to relaunch this except I'm going to add formulation. Now I could put formulation in the by category or in the group category. If I put in the by category and I click OK, I get four separate curves. That's not bad. And if I hold down my control key and I click on the red triangle for any of them, and I go to sigmoid logistic, 4 parameter Hill, what it will do is it will actually fit a curve for each of the four separately and give me the estimates for each of the four. So there's the first, one standard; here's Test A, its estimates; there's Test B with its estimates. Notice the asympts are different. And Test C. This is nice, but not still not exactly what I'm looking for. So I'm going to actually close this. I'm going to relaunch the analysis, except in this particular time, I'm going to take the formulation and put in the group category. And once again, by doing that, now I see all four are kind of overlaid on top of each other. And I can come up here and click on sigmoid, logistic, 4 parameter hill, and now what it shows me as the four different curve fits overlaid on top of each other in the plot. I can also get the parameter estimates for those four right here. So these should be identical to what we saw in the last screen. But visually now, I can take a look at these plots for the four overlaid on top of each other and see how they look. Do they look similar to each other or not? So this is, this is pretty good. I mean, this is, this might be good enough for what you might need. And if you want to pull these estimates out of this particular parameter... parameter estimates, I can right click on it and I can say make into a data table, which then allows me to take this data table with the estimates in it, and I could do something with that, whatever I happened to want. So that's, that's good to know. I'm going to close that out. So let's take a quick look at the nonlinear platform. So, analyze, specialized modeling, nonlinear. This looks similar. And if I put toxicity in my y response, I'll say formulation in my group, and log concentration in my x, and I say, okay. I get, oh wait, this fits...this says fit curve. We just did a fit curve, didn't we? And yes we did actually. This is identical to if I come up here under analyze, screening, fit curve, I say recall and I say, okay, they are identical. If I don't do anything different in the nonlinear platform, I actually end up in the fit curve platform. So what can we do that's different than the specialized modeling nonlinear? I'm going to see recall pull everything back in notice there's a model library on the left. And also notice this x could be a predicter formula, not just x values. So if I click on model library, I have a lot of models that I could choose from. And once again, I don't really know what these are. But notice, if I click on one, I can get a function, I can even say show graph. And it gives me a picture of it. Oh, that looks like something like I'm looking for. But it's kind of flipped in the wrong direction. So, this, this might work for me. Um, but what I don't see here is one called the Hill function. I see the Rodbard function. That's the five parameter, Richard. Where was it? Rodbard models here, that's similar, but there's no Hill function in all of it. So it's not exactly what I want. But it does allow me to do some things. The one thing that the nonlinear platform lets me do is it allows me to actually specify parameters themselves and lock them in. So what do I mean by that? Well, let's just say that I go to model library and I say logistic, 4 parameter. And I say, is it make formula, I believe? Toxicity here. Log concentration here. Oops. Formulation log concentration here and I say, okay, and this is standard. Nice. Okay. It actually does fit 4 parameters using a function that's not quite the function that I'm looking for. And you know this is not bad. But here, notice I can actually, like, change these different parameters using the sliders. That's kind of interesting. So I can say make formula and what it did is it actually put a formula on my data table here. If I take a look at the formula, it's over here and it has parameters with initial values and it has this big long equation for all four fit in the formula. So instead I can actually come back here now and I can put that in my x predicter and I can say okay. And what it does is it comes up and shows me all these, you know, the four functions and what the initial values were. If you click Go, it'll actually try and fit these. And notice that actually it did fit them in a count of four iterations. Pretty quick, actually, where there's a limit of 60, it fit them and fit them well. But notice I have in here the ability to change and I can change via sliders down here. Okay. Or I could change up here using actual...I could type in actual values, but I can change the current values and I can lock them in. This can be rather helpful and I will demonstrate that next with my ill-behaved data. I'm going to close this out and I'm going to close, get rid of this particular column. Okay, so we have this data set, our initial data set still there. Let's take a look at our ill-behaved data now. What I'm gonna do is I'm going to get rid of every, every other row of data and all of the low end concentration data points. So I'm going to push this button, thin data and eliminate lowest points. Every other one is gone, and now it's going to get rid of all the lowest ones. So what exactly does that do to our data? If we take a look at it. Let's just go over to fit model really quick, specialized, fit curve, excuse me, fit curve. And so recall. We only have the highest five points on the curve now. If I come up here and I say fit curve, and I do sigmoid, logistic...sigmoid, logistic, Hill. And it actually fit curves to those five data points separately. But notice something, I bring this up and bring this way out. Notice my lower asymptotes here are just completely different from one another. There's a third one. And if I keep going, eventually I'll get to the fourth one, which is way down here at like minus 80. So four lower asymptotes that are just completely different. So it doesn't make any sense, right? It fits the top part of the curves well, but it really, it really doesn't fit the bottom part of the curves well at all. The tops look pretty good but the bottoms don't. So rather than extrapolating, oftentimes when we're running these assays, the client, user, what they'll do is they'll put blanks, so material on the plate that has no concentration on purpose. This is usually some sort of background material that's in the assay itself. And in this particular case, I happen to know that blanks were used and the average of those blanks, as I say, down here was .5. So the average was .55. So really somewhere over here by the time we get to .55, all of the lower asymptotes should actually come down to .55. But I can't, I can't change that here in the fit curve platform. Ah, but I can in the nonlinear platform. But I don't have the Hill function in the nonlinear platform, so this gets kind of confusing. But we can get around this problem. Using a thick curve platform, once I fit my model, the logistic 4P Hill, I can actually save a formula. I can save a prediction formula. Or I can save a parametric prediction formula. And there's a difference here. The prediction formula saves these exact functions just exactly the way that they were, and the parametric prediction formula is a little bit different. In this particular case, this is the parametric prediction formula. If we take a look at it, what it shows is, it shows here are the four equations. Just as I thought they would be, so you know, if the formulation is standard, use this; if it's Test A use this. And down here are all the parameters. And actually, if one takes a look at this, if I was to copy this and paste it into another document... I do that really quick and I come up here and I say File, New, excuse me....file, new, will a journal work? Journal doesn't work. That's ok. So let's have a new script. Paste. There we go. Notice at the top are all of my parameters. At the bottom is actually the formulation...the formula that we were seeing over here on the right. So everything comes over, but these are initial values with the function itself. I just wanted to show that because it's kind of hidden unless you understand the parameters. But with this, I can now come over... come over (let me back up. Sorry.) to analyze, specialized modeling, nonlinear, and I can use this predicter in my x value. My y can be my toxicity; formulation is my group, I can say okay and now here are those same five data points per curve. And the strange looking plots, but notice I have the ability, again, to change things. I can...I know that the c parameter is my lower asymptote. So I can change each of these to .55 and I can lock that in. So .55. By doing this, you're seeing the curves actually changing. It's not, it's not fitting them yet, but it's allowing us to actually force a value that we believe is the correct value. So what I want to bring this up and brings over. What you'll see is that it made all of them .55 for the lower asymptotes. And now I could click on go, which actually is then going to be the fitting, what you see is that in just seven iterations it fit the rest of the parameters to those four curves, such that the lower asymptotes are all .55. That's great. That's exactly what I what I want in this particular case. And so once again under the red triangle, I can save a formula. In this case, I can't save the parametric prediction. I can just save that prediction, but I can do that. So I can use that over here. And where might I use that? Remember, I said that if I come over here under Graph Builder and I was to put my toxicity to my y axis; log concentration on the x; formulation, maybe in my overlay, this is what I see. I can also bring over here the formula...the formula...yes...yes...itself. And sometimes this can be useful. In this case they look really really similar. What I can do is I actually can take away the curve from my points and the smoother can be on the formulation itself. So this is the direction. And so actually, these are the curves that belong to the function themselves, not the smoother. And so this is one way to actually show the correct curves for this data set, even when the data is ill-behaved. And I'm saying ill-behaved here because we just don't have a lower asymptote but we have something that we can use in place of it. So that's what I wanted to show you. And I think this is really kind of cool. The thing is, you have to be able to go back and forth, or you need to at times, between the thick curve and the nonlinear platforms to get what you really want to get out of JMP, out of the functionality that you really need, but I think this is this is really great. So you could clear the row states, for instance, and you could actually show all of the data (I guess I should have wrapped this up, but I didn't), various formulations and overlay, log concentration And this is the curves, but you could actually use the prediction formula instead in this to to get the actual formulas that you that you want. And this is, this is really kind of nice to see. It's not something that is really talked about. There is a link, a blog link in the JMP discussions that Mark Bailey and somebody else put out, I think, two years ago that describes this methodology, a lot of this methodology. I just found it yesterday, long after I already figured it out myself, but I thought it was worth sharing to everybody, how we can fit these nonlinear models, especially in the dose-response world or these different biologics are pharmaceuticals, especially with all the talk these days of Covid 19 and there's a lot of work going on in this area. So with that, I guess I want to say thank you. Last but not least, I have some contact information. I am Martin in the JMP discussion forums and I post out there, somewhat frequently. My email address is also listed down here if you have any questions for me. So with that, thank you very much and I appreciate your time. Happy to take any questions. Thank you.

0 attendees

0

Event has ended

0 attendees

0

Monday, October 12, 2020

Kelci Miclaus, Senior Manager Advanced Analytics R&D, JMP Life Sciences, SAS Reporting, tracking and analyzing adverse events occurring to patients is critical in the safety assessment of a clinical trial. More and more, pharmaceutical companies and the regulatory agencies to whom they submit new drug applications are using JMP Clinical to help in this assessment. Typical biometric analysis programming teams may create pages and pages of static tables, listings and figures for medical monitors and reviewers. This leads to inefficiencies when the doctors that understand medical impacts of the occurrence of certain events can not directly interact with adverse event summaries. Yet even simple count and frequency distributions of adverse events are not always so simple to create. In this presentation we focus on key reports in JMP Clinical to compute adverse event counts, frequencies, incidence, incidence rates and time to event occurrence. The out of the box reports in JMP Clinical allow fully dynamic adverse event analysis to look easy even while performing complex computations that rely heavily on JMP formulas, data filters, custom-scripted column switchers and virtually joined tables. Auto-generated transcript... Speaker Transcript Kelci J. Miclaus Hello and welcome to JMP Discovery Online. Today I'll be talking about summarizing adverse event summaries and clinical trial analysis. I am the Senior Manager in the advanced analytics group for the JMP Life Sciences division here at SAS, and we work heavily with customers using genomic and clinical data in their research. So before I go through the summarizing and the details around using JMP with adverse event analyses, I want to introduce the JMP Clinical software which our team creates. JMP Clinical is one of the family of products that includes now five official products as well as add ins, which can extend JMP to really allow you to have as many types of vertical applications or extensions of JMP as you want. My development team supports JMP Genomics and JMP Clinical. JMP Genomics and JMP Clinical are respectively vertical applications that are customized, built on top of JMP, that are used for genomic research and clinical trial research. And today I'll be talking about how we've created reviews and analyses in JMP Clinical for pharmaceutical industries that are doing clinical trials safety and early efficacy analysis. The original purpose of JMP Clinical and the instigation of this product actually came through assistance to the FDA, which is a heavy JMP user And their CDER group, the Center for Drug Evaluation and Research. Their medical reviewers were commonly using JMP to help review drugs submissions. And they love it. They're very accomplished with it. One of the things they found though is that certain repetitive actions, especially on very standard clinical data could be pretty painful. Example here is the idea of something called a shift plot which is for laboratory measurements where you compare the trial average of a laboratory of versus the baseline against treatment groups. In order to create this, it took at least eight to 10 steps within the JMP interface of opening up the data, normalizing the data, subsetting it out into baseline versus trial, doing statistics, respectively, for those groups merging it back in, then splitting that data by lab tests so you could make this type of plot for each lab. And that's not even to get to the number of steps within Graph Builder to build it. So JMP clearly can do it, but what we wanted to do is solve their pain at this very standard type of clinical data with a one-click lab shift plots, for example. In fact, we wanted to create clinical reviews in our infrastructure that we call the review builder that are one-click standardized reproducible reviews for many of the highly common standard analyses and visualizations that are required or expected in clinical trial research to evaluate drug safety and efficacy. So JMP Clinical has evolved since that first instigation of creating a custom application for a shift plot into a full-service clinical...clinical trial analysis software that covers medical monitoring and clinical data science, medical writing teams, biometrics and biostatistics, as well as data management around the study data involved with clinical trial collection. This goes for both safety and efficacy but also operational integrity or operational anomalies that might be found in the collection of clinical data as well. Some of the key features around JMP Clinical that we find to be especially useful for those that are using the JMP interface for any types of analyses are things like virtual joins. So we have an idea of a global review subject filter, which I'll show you during the demonstrations for adverse events, that really allow you to integrate and link the demography information or the demographics about our subjects on a clinical trial to all of the clinical domain data that's collected. And this architecture, which is enabled by virtual joins within the JMP interface with row state synchronization, allow you to really have instantaneous interactive reviews with very little to no data manipulation across all the types of analyses you might be doing in a clinical trial data analysis. Another new feature we've added to the software that also leverages some of the power of the JMP data filter, as well as creation of JMP indicator columns, is this ability to, while you're interactively reviewing clinical trial data, find interesting signals that say, in this example, the screenshot shown is subjects that had a serious adverse event while on the clinical trial, find those interesting signals, and quite immediately, create an indicator flag that is stored in metadata with your study in JMP Clinical that's available for all other types of analyses you might do. So you can say, I want to look now at my laboratory results for patients that had a serious adverse event versus those that didn't to see if there's also anomalies that might be related to an adverse event severity occurrence. Another feature that I'll also be showing with JMP Cclinical and the demonstration around adverse event analysis is the JMP Clinical API that we've built into the system. One of the most difficult things of providing and creating and developing a vertical application that has out-of-the box one-click reports is that you get 90% of the way there and then the customer might say, oh, well, I really wanted to tweak it, or I really wanted to look at it this way, or I need to change the way the data view shows up. So one of the things we've been working hard on in our development team is using JMP scripting JSL to surface an API into the clinical review, to have control over the objects and the displays and the dashboards and the analyses and even the data sets that go into our clinical reviews. So I'll also be showing some of that in the adverse event analysis. So let's back up a little bit and go into the meat of adverse events and clinical trials now that we have an overview of JMP Clinical. There's really two kind of key ways of thinking of this. There's that safety review aspect of a clinical trial where that's typically counts and percentages of the adverse events that might occur. And a lot of the medical doctors, monitors, or reviewers often use this data to understand medical anomalies, you know, a certain adverse event starts showing up more commonly, with one of the treatments that could have medical implications. There's also the statistical signal detection, the idea of statistically assessing our adverse events occurring at an unusual rate in one of the treatment groups versus the other. So here, for example, is a traditional static table that you see in many of the types of research or submissions or communications around a clinical trial adverse event analysis. Basically it's a static table with counts percents and if it is more statistically oriented, you'll see things like confidence intervals and p values as well around things like odds ratios or a relative risks or rate differences. Another way of viewing this can also be visually instead of with a tabular format so signal detection, looking at say odds ratio or the, the risk difference might use the Graph Builder in this case to show the results of a statistical analysis of the incidence of certain adverse events and how they differ between treatment groups, for example. So those are two examples. And in fact, from the work we've done and the customers we've worked with around how they view and have to analyze adverse events, the JMP Clinical system now offers several common adverse event analyses from simple counts and percentages to incidence rates or occurrences into statistical metrics such as risk difference, relative risk, odds ratio, including some exposure adjusted time to event analyses. We can also get a lot more complex with the types of models we fit and really go into mixed or Bayesian models as well in finding certain signals with our adverse event differences. And also we use this data heavily in reviewing just the medical data in either a medical writing narrative or patient profile. So now I'm going to jump right into JMP Clinical with a review that I've built around many of these common analyses. So one of the things you'll notice about JMP Clinical is it doesn't exactly look like JMP, but it is. It's a combined integrated solution that has a lot of custom JSL scripting to build our own types of interfaces. So our starter window here lays out studies, reviews, and settings, for example. And I already have a review built here that is using our example nicardapine data. This is data that's shipped with the product. It's also available in the JMP sample library. It's a real clinical trial, looking at subarachnoid hemorrhage. It was with about 900 patients. And so what this first tab of our review is looking at is just the distribution of demographic features of those patients, how many were males versus females, their race breakdowns, what treatment group they were given, their sites that the data was taken from, etc. So this is very common, just as the first step of understanding your clinical data for a clinical trial. You'll notice here we have a report navigator that shows the rest of the types of analyses that are available to us in this built review. I'm going to walk through each of these tabs, just quickly to show you all the different flavors of ways we can look at adverse events with the clinical trial data set. Now, the typical way data is collected with clinical trials is an international standard called CDISC format, which typically means that we have a very stacked data set format. Here we can see it, where we have multiple records for each subject indicating the different adverse events that might have occurred over time. This data is going to be paired with the demography data, which is one row per each subject as seen here in this demographic. So we have about 900 patients and you'll see in this first report, we have about 5,000 or 5,500 records of different adverse events that occurred. So this is probably the most commonly used reports by many of the medical monitors and medical reviewers that are assessing adverse event signals. What we have here is basically a dashboard that combines a Graph Builder counts plot with an accompanying table, as they are used to seeing these kind of tables. Now the real value of JMP is its interactivity and that dynamic link directly to your data so that you can select anywhere in the data and see it in both places. Or more powerfully, you can control your views with column switchers. Now here we can actually switch from looking at distribution of treatments to sex versus race. You'll notice with race, if we remember, we had quite a few that were white in this study, so this isn't a great plot when we look at it by percent or by counts, so we might normalize and show percents instead. And we can also just decide to look at the overall holistic counts of adverse events as well. Another part of using this as this column switcher is the ability to you know categorize what kind of events those were. Was it a serious adverse event? What was the severity of it? Was the outcome that they are when they recovered from it or not? What was causing it? Was it related to study drug? All of these are questions that medical reviews will often ask to find interesting or anomalous signals with adverse events in their occurrences. Now one of the things you might have already noticed in this dashboard is that I have a control group as column switcher here that's actually controlling both my graph and my table. So when I switched to severity, this table switches as well. This was done with a lot of custom JSL scripting specifically to our purposes, but I'll tell you a secret, in 16 the developer for column switcher is going to allow us to have this type of flexibility so you can tie multiple platform objects into the same columns switcher to drive a complex analysis. I'm going to come back to this occurrence plot, even though it looks simple. Here's another instance of it that's actually looking at overall occurrence where certain adverse events might have occurred multiple times to the same subject. I'm going to come back to these but kind of quickly go through the rest of the analyses and these reviews before coming back to some of the complexities of the simple graph builder and tabulate distribution reports. The next section in our review here is an adverse event incident screen. So here we're making that progression from just looking at counts and frequencies or possibly incidence rates into more statistical framework of testing for the difference in incidence of certain adverse events in one treatment group for another. And here we are representing that with a volcano plot. So we can see actually that phlebitis, hypotension and isothenuria occur much more often in our treatment group, those that were treated with nicardipine, versus those on placebo. So we can actually select those and drill into a very common view for adverse events, which is our relative risk for a cell plot as well, which is lots of lot of times still easier to read when you're only looking at those interesting signals that have possibly clinical or statistical significant differences. Sometimes clinical trials take a long time. Sometimes they're on them for a few weeks, like this study was only a few weeks, but sometimes they're on them for years. So sometimes it's interesting to think of adverse event incidents differences as the trial progresses. We have this capability as well within the incidence screen report where you can actually chunk up the study day, study days into sections to see how the incidents of adverse events change over time. And a good way to demonstrate that might be with an exploding volcano plot here that shows how those signals change across the progression of the study. So another powerful idea with this, especially as you have longer clinical trials or more complex clinical trials, is instead of looking at just direct incidence among subjects you can consider their time to event or their exposure adjusted rate at which those adverse events are occurring. And that's what we offer within our time to event analyses, which once again, shown in a volcano plot looking here using a Kaplan Meier test at differences in the time to event of certain events that occur on a clinical trial. One of the nice things here is that you can select these events and drill down into the JMP survival platform to get the full details for each of the adverse events that had perhaps different time to event outcomes between the treatment groups. Another flavor of time to event is often called an incidence density ratio, which is the idea of exposure adjusted incidence density. Basically the difference here is instead of using some of the more traditional proportional hazards or Kaplan Meier analyses, this is more like a a poisson style distribution that's adjusted for how long they've actually been exposed to a drug. And once again here we can look at those top signals and drill down to the analogous report within JMP using a generalized linear model for that specific type of model with an adverse event signal detection. And we actually even offer some really complex Bayesian analyses. So one of the things with with this type of data is typically adverse events exist within certain body systems or classes...organ classes. And so there is a lot of posts...or prior knowledge that we can impose into these models. And so some of our customers, their biometrics teams decide to use pretty sophisticated models when looking at their adverse events. So, so far we've walked from what I would say consider pretty simplistic distribution views of the data into distributions and just count plots of adverse events into very complex statistical analyses. I'm going to come back now, back to what is that considered simple count and frequency information and I want to spend some time here showing the power of JMP interactivity that we have. As you recall one of the differences here is that this table is a stacked table that has all of the occurrences of our adverse events for each subject, and our demography table, which we know we have 900 subjects, is separate. So what we wanted was not a static graph, like we have here, or what we would have in a typical report in a PDF form, but we wanted to be able to interactively explore our data and look at subgroups of our data and see how those percentages would change. Now, the difficulty is that the percent calculation needs to come from the subject count in a different table. So we've actually done this by formula...like creating column formulas to dynamically control recalculation of percents upon selection, either within categorizing events or, more powerfully, using our review subject filter tool. So here for example, we're looking at all subjects by treatment. Perhaps serious versus not serious adverse events, but we can use this global data filter which affects each of the subject level reports in our review and instantaneously change our demography groups and change our percentages to be interactive to this type of subgroup exploration. So here, now we can actually subgroup down to white females and see what their adverse event percentage and talents are, or perhaps you want to go more granular and understand for each site, how their data is changing for different sites. So what we really have here is instead of a submission package or a clinical analysis where the biometrics team hands 70 different plots and tables to the medical reviewer to go through, sift through, they have the power to create hundreds of different tables and different subsets and different graphics, all in one interface. In fact, you can really filter down into those interesting categories. So if they were looking say at serious adverse events and they wanted to know serious adverse events that were related to drug treatment very quickly, now we got down to a very small subset from our 900 patients to about nine patients that experienced serious adverse events that were considered related to the treatment. So as a medical reviewer this is a place where Ithen might want to understand all of the clinical details about these patients. And very quickly, I can use one of our action buttons from the report to drill down to what's called a kind of a complete patient profile. So here we see all of the information now, instead of at a summary level, at a subject individual level of everything that occurred to this patient over time, including when they had serious adverse events occur and their laboratory or vital measurements that were taken alongside of that. One of the other main uses of our JMP Clinical system along with this medical review, medical monitor is medical writing teams. So another way of looking at this instead of visually in a graphic or even in a table which these are patient profile tables, you can actually go up here and generate an automated narrative. So here we're going to actually launch to our adverse event narrative generation. Again, one of the benefits and values of our JMP Clinical being a vertical application relying on standard data is that we get to know all the data and the way it is formatted up up up front, just by being pointed to the study. So what we can do here is actually run this narrative that is going to write us the actual story of each of those adverse events that occurred. And this is going to open up a Word doc that has all of the details for this subject, their demography, their medical history, and then each of the adverse events and the outcomes or other issues around those adverse events. And we can do this for one patient at a time or we can actually even do this for all 900 patients at a time and include more complex details like laboratory measurements, vitals, either a baseline or before. And so, medical reviewers find this incredibly valuable be able to standardly take data sources and not make errors in a data transfer from a numeric table to an actual narrative. So I think just with that you can really see some of the power of these distribution views, these count plots that allow you to drill into very granular levels of the data. This ability to use subject filters to look either within the entire population of your patients on a clinical trial or within relevant subgroups that you may have found. Now one thing about the way our global filter works through our virtual joins is this is only information that's typically showing the information about the demography. One of the other custom tools that we've scripted into this system is that ability to say, select all subjects with a serious adverse event. And we can either derive a population flag and then use that in further analyses or we can even throw that subject's filter set to our global filter and now we're only looking at serious...at a subject who had a serious adverse event, which was about...almost 300 patients on the clinical trial had a serious adverse event. Now, even this report, you'll see is actually filtered. So the second report is a different type of aspect of a distribution of adverse events that was new in our latest version which is incidence rates. And here, the idea is instead of normalizing or dividing to calculate a percent by the number of subjects who had an event. If you are going with ongoing trials or long trials or study trials across different countries that have different timing startup times, you might want to actually look at the rate at which adverse events occur. And so that's what this is calculating. So in this case, we're actually subset down to any subjects that had a serious adverse event. And we can see the rate of occurrence in patient years. So for example, this very first one, see, has about a rate of 86 occurrences in every 10 patient years on placebo versus 71 occurrences In nicardipine. So this was actually one which this was to treat subarachnoid hemorrhage, intracranial pressure increasing likely would happen if you're not being treated with an active drug. These percents are also completely dynamic, these these incidence rates. So once again, these are all being done by JMP formulas that feed into the table automatically that respect different populations as they're selected by this global filter. So we can look just within say the USA and see the rates and how they change, including the normalized patient years based on the patients that are from just the USA, for example. So even though these reports look pretty simple, the complexity of JSL coding that goes beyond building this into a dashboard is basically what our team does all day. We try to do this so that you have a dashboard that helps you explore the data as you know, easily without all of these manipulations that could get very complex. Now the last thing I wanted to show is the idea of this custom report or customized report. So this is a great place to show it too, because we're looking here at adverse events incidence rates. And so we're looking by each event. And we have the count, or you can also change that to that incidence rate of how often it occurs by patient year. And then an alternative view might be really wanting to see these occurrences of adverse events across time. And so I want to show that really quick with our clinical API. So the data table here is fully available to you. One of the things I need to do first off is just create a numeric date variable, which we have a little widget for doing that in the data table, and I'm going to turn that into a numeric date. Now you'll notice now this has a new column at the end of the numeric start date time of the adverse event. You'll also notice here is where all that power comes from the formulas. These are all actually formulas that are dynamically regenerated based on populations for creating these views. So now that we have a numeric date with this data, now we might want to augment this analysis to include a new type of plot. And I have a script to do that. One of the things I'm going to do right off the bat is just create a couple extra columns in our data set for month and year. And then this next bit of JSL is our clinical API calls. And I'm not going to go into the details of this except for that it's a way of hooking ourselves into the clinical review and gaining access to the sections. So when I run this code, it's actually going to insert a new section into my clinical review. And here now, I have a new view of looking at the adverse events as they occurred across year by month for all of the subjects in my clinical trial. So one of the powers, again, even with this custom view is that this table by being still virtually joined to our main group can still fully respond to that virtual join global subject filter. And so just with a little bit of custom API JSL code, we can take these very standard out-of-the-box reports and customize them with our own types of analyses as well. So I know that was quite a lot of an overview of both JMP Clinical but, as well as the types of clinical adverse event analyses that the system can do and that are common for those working in the drug industry or pharma industry for clinical trials, but I hope you found this section valuable and interesting even if you don't work in the pharma area. One of the best examples of what JMP Clinical is is just an extreme extension and the power of JSL to create an incredibly custom applications. So maybe you aren't working with adverse events, but you see some things here that can inspire you to create custom dashboards or custom add ins for your own types of analyses within JMP. Thank you.

0 attendees

0

Event has ended

0 attendees

0

Monday, October 12, 2020

0 attendees

0

Event has ended

0 attendees

0

Monday, October 12, 2020

0 attendees

0

Event has ended

0 attendees

0

Monday, October 12, 2020

Melissa Reed, MS Business Analytics and Data Science, Oklahoma State University This project is about Early Presidential Primaries and how the results from those primaries affect who wins the Presidency. This research will focus on the Presidential Primaries where a new President was elected, so that would be the elections of 1992, 2000, 2008, and 2016. The elections in 2000, 2008, and 2016 will be focused on because no incumbent running, however, the election of 1992 Bill Clinton defeated the current President George H. W. Bush to win the Presidency. The election of 1992 will be focused on because George H. W. Bush is the only President that did not get re-elected since the Cold War ended in 1991. The specific primaries that will be focused on are the Iowa Caucus, the New Hampshire Primary, and Super Tuesday, because they are the primaries that help predict the rest of the country’s primaries since they are early in the election cycle. The hypothesis for this research is that the candidate that wins most of the Early Presidential Primaries wins the Candidacy and the Presidency. JMP software will be used to test the hypothesis. The research concluded that the person who wins the most primaries, will most likely win the Party Candidate but will not always win the Presidency. Auto-generated transcript... Speaker Transcript melissareed Hello, my name is Melissa Reed and I will be presenting about my poster, and it is about the early presidential primaries. I am from Oklahoma State University. So a little bit of background about the early presidential primary is that a lot of people aspire to be the President of the United States and not actually...not a lot of people actually run for it. And the campaigns usually start about two years before the November election, but a lot of campaigns do not make it to the Republican and Democratic National Conventions for a number of reasons, because they didn't either get enough votes or a lot of time, they run out of money beforehand. The early presidential primaries that this poster focuses on are the Iowa caucus, the New Hampshire primary and Super Tuesday. The reason that these were chosen is because they are three early primaries and typically, the way that these go, the rest of the country will follow. And they are just really important. So the hypothesis for this project is the person who wins the most votes during the early presidential primaries will more than likely win the Democratic or Republican candidacy for the President of the United States. Looking at the elections of 1992, 2000, 2008, and 2016, they were focused on because a new president won the President... won the Office of the President of the United States. 1992 is focused on because because President Bill Clinton defeated the current president George HW Bush, and George HW Bush was the first president since the Cold War ended not to be reelected. 200, 2008, and 2016 are focused on because there were no incumbents running. You can see on the poster that in 1992 and 2000 and 2016, the candidate that won the Democratic and Republican candidacy for the President of the United States were the two people that had the most votes out of those three early presidential primaries. However, in 2008 Barack Obama and Hillary Clinton were the top two candidates that got the most amount of votes, but because they are both Democrats they could not both get the candidacy, and the Republicans named John McCain. So to do the analysis, I used to JMP to run a correlation analysis and a logistic regression. I ran the coordination analysis between the year and how many votes were cast to see if there was a connection between them and the correlation analysis to prove that the year, there's a connection between the year and the amount of votes that were cast. I ran the logistic regression between the candidate, the year and a state primaries to see who was most like...most likely candidate was to beat the other candidate. The results of those elections...the results of that regression are down below in the result sections. Now in 1992 the New Hampshire primary was the one that I focused on because George HW Bush did not have anyone running against him in the Iowa Caucus, so so I chose New Hampshire for that one. And the rest of the elections from 1992, 2000, 2008 and 2016, the logistic regression showed that the person who is most likely the win isn't the candidate who's actually the person to get the most votes. In 2008, Barack Obama was shown to win some elections against him and Hillary Clinton, but not against everyone else. In conclusion, the person that wins the most votes the Iowa Caucus, the New Hampshire primary and Super Tuesdays will most likely win the candidacy for the Democratic and the Republican and they will run for the President of United States. Now in 2008 there was a difference because the two people that won the most votes in those primaries were two Democrats. Thank you so much.

0 attendees

0

Event has ended