Abstracts

0 attendees

0

Friday, December 18, 2020

レベル：中級被験者に発生する有害事象の報告、追跡、分析は、臨床試験の安全性評価において重要です。多くの製薬会社と新薬申請を提出する先である規制当局は、この有害事象の評価を支援するJMP Clinicalを用いています。バイオメトリック分析プログラミングチームは、メディカルモニターやレビューアのために静的な表、リスティング、および図を作成する場合があります。このことは、特定事象の発生による医学的影響を理解しているドクターが有害事象の要約と直接対話ができないといった非効率につながります。しかし、有害事象の単純なカウントと頻度分布を作ることでさえ、必ずしも簡単であるとは限りません。このプレゼンテーションでは、JMP Clinicalの主要なレポートである有害事象のカウント、頻度、発生率、事象が発生するまでの時間の出力に焦点を当てます。JMP Clinicalの常識を超えたレポート機能により、JMPの計算式、データフィルタ、カスタムスクリプト化された列スイッチャー、仮想結合されたテーブルに大きく依存する複雑な計算を行っている場合でも、完全に動的な有害事象分析を簡単に行うことができます。 Kelci Miclausは、JMPライフサイエンスR&Dのマネージャーであり、JMP GenomicsとJMP Clinicalソフトウェアの統計機能を開発しています。彼女は、2006年にSASに入社し、ノースカロライナ州立大学で統計学の博士号を取得しています。

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：初級発表タイトル：空間データおよび形態解析ツールとしてのJMP―最短距離法クラスター分析を用いた上気道上皮内癌とその前駆病変のGradingの試み口腔，咽頭や喉頭など上気道粘膜の悪性腫瘍の多くは粘膜の表面を覆う重層扁平上皮という細胞の層に発生する扁平上皮癌である．さらにこの前段階ないし初期と考えられる病変が粘膜の白色ないし赤色の局面として見出されることが臨床的に知られており，患者から採られた組織の顕微鏡標本の観察によりそれぞれ上皮異形成epithelial dysplasia，上皮内癌carcinoma in situと名付けられている．さらに異形成は細胞形態の異常とそれらが上皮層内に占める比率に応じて変化の軽いものから軽度mild，中等度moderateおよび高度severeのグレードに三分されている．この鑑別は病理医の視察により直観的に行われ，ある程度の再現性を有しているものと考えられるが，客観的な検討は多くない．今回上皮層における細胞(核)の配置を定量化し，非腫瘍性(正常)，異形成，上皮内癌でどのような差異があるか検討を試みた．デジタル画像解析により顕微鏡写真上で細胞の核の重心座標を抽出し，JMP ver.15の最短距離法の階層的クラスター分析を用いて各重心をつなぐ最小木(Minimum spanning trees; MST)を生成させ，各枝の長さのヒストグラムを比較検討したところ，各群の間に差異が見出された．千場良司東北大学元講師(加齢医学研究所病態臓器構築研究分野)．医学博士．元文部省在外研究員(医学)(デンマーク王国オーフス大学)．人体病理学の領域において疾患の病理発生過程を幾何確率論や積分幾何学を応用した定量形態学，デジタル画像解析および多変量統計解析を用いて研究してきた．肝硬変，肺胞上皮，膵管上皮および子宮内膜に発生する早期癌とその前駆病変や癌の肝転移に関する研究論文がある. (https://pubmed.ncbi.nlm.nih.gov/7804428/ , https://pubmed.ncbi.nlm.nih.gov/7804429/, https://pubmed.ncbi.nlm.nih.gov/8402446/ , https://pubmed.ncbi.nlm.nih.gov/8135625/, https://pubmed.ncbi.nlm.nih.gov/7840839/ , https://pubmed.ncbi.nlm.nih.gov/10560494/) 癌の発生過程やその組織診断の観点からそれらの解析に応用可能な数理的手法に興味があり，クラスター分析や判別分析などの数値分類法に特に関心がある．統計解析のプラットフォームとしてはメインフレーム上のFortran統計サブルーチン，PC上のSPSSやSYSTATなどを経て優れたデータテーブルの機能と柔軟な分析環境に注目しバージョン8からのJMPユーザーである．　　千場叡公立はこだて未来大学システム情報科学部複雑系知能学科卒．在学中は物理化学反応における複雑系現象に興味を持ち，アミノ酸熱重合物のアルコール液相中におけるカプセル形成機構に関する実験と研究を行った．現在はデジタル画像解析，データサイエンスおよびニューラルネットワークを用いた形態および画像の認識や分類にも関心を持っている．

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：初級早崎将光, 主任研究員, エネルギ・環境研究部　環境評価グループ, 一般財団法人日本自動車研究所伊藤晃佳, グループ長・主任研究員, エネルギ・環境研究部　環境評価グループ, 一般財団法人日本自動車研究所我々の主要な研究テーマは、自動車交通と大気環境、ならびに大気環境と人への健康影響であり、自動車交通量は重要な情報の一つである。自動車交通量の指標の一つである断面交通量は、車両感知器などによる交通量の情報で、それぞれの地点における5分毎のデータが公開されている。現在、東京都内では約2,400ヵ所の断面交通量情報が公開されている。断面交通量は、比較的広い範囲における自動車交通量を、面的にとらえる指標として重要である。新型コロナウィルス（COVID-19）の感染拡大による緊急事態宣言によって、社会経済活動は大きく変化し、自動車交通にも影響があったと考えられる。今回我々は、緊急事態宣言期間の前後における東京都内の自動車交通量の変化を、断面交通量を指標として解析を行った。また、同期間における大気質の変化についても検討を行った。解析の主要なツールとしてjmpを用いた。jmpのテーブル結合、連結機能などのデータテーブル編集機能、データとリンクしているグラフ機能を用いることで、効率的に解析を実施することが出来た。本報告では、我々のjmp使用例について紹介をする。堺温哉愛媛大学大学院連合農学研究科博士課程修了（農学博士）学術振興会特別研究員（PD）、浜松医科大学（教務補佐員）、横浜市立大学医学部（助教）、信州大学医学部（特任助教）を経て2012年9月より一般財団法人日本自動車研究所に所属（主任研究員）、2020年4月より現職。現在の主要な研究テーマは、Traffic Related Air Pollution (TRAP) を対象とした大気環境疫学。早崎将光筑波大学大学院博士課程地球科学研究科を単位取得退学（2000年）．同大学生命環境科学研究科地球環境科学専攻にて博士（理学）取得（2006年）．国立環境研究所，千葉大学環境リモートセンシング研究センター，富山大学，九州大学，東京大学大気海洋研究所での勤務（PD，特任研究員など）を経て，2017年より現職．主な研究テーマは，高濃度大気汚染事象の要因解明など．伊藤晃佳 2002年3月北海道大学大学院工学研究科環境資源工学専攻博士後期課程修了，博士（工学）．2002年4月より一般財団法人日本自動車研究所に所属し，2010年より現職．近年の主要な活動として，大気環境に対する発生源寄与度の評価，大気観測結果（常時監視局等）を用いた解析，大気シミュレーション（CMAQ等）を用いた解析等が挙げられる． ※配布資料はございません。

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：初級大学院でSQLに詳しくない者がビッグデータを扱う必要が生じたので標記の内容を試みた。今回は、大学院生の情報関係のスキルアップも目的とし、Buffalo社のNASであるLinkstation410のケースのみを入手し、LinuxのDebian 10.5をインストールした。その後、NAS上にMariaDB(MYSQL互換品)を設定し、Windows PCにあったACCESSベースの救急救命関係の5000万件規模のデータをMariaDBに移送した。最終的にPC上のJMP Pro15.1.0のクエリビルダーからNAS上のMariaDBを接続して解析を行っている。本手法は、NASが小型(640g)なため、テレワークの大学院生にデータサーバー環境を宅配で支給するのも可能である。本報告では、NASのケースのみからMYSQLサーバーを組み上げJMPと連携させるまでの、技術的ノウハウと運用上の留意点について報告する。 1980年３月慶応義塾大学工学研究科電気専攻修了、工学修士。 1993年6月東邦大学医学部、医学博士。現在、国士舘大学大学院救急システム研究科教授。受賞歴：SAS Users' Group International Japan功績賞（1999), SAS ユーザー総会ポスター賞(2011)、ヨーロッパ蘇生協議会, Best of the Best Abstract(2010,2018)

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：中級データ解析を行う前に，必要なデータの抽出，変数間の対応付けの変更・整形，変数変換・カテゴリ化・再カテゴリ化等を行って，解析用データセットを作成する必要がある。JMPには行や列の抽出，結合等のデータベース操作，変数変換等に必要な計算関数が用意されている。これらの機能を用いて解析用データセットは容易に作成できる。しかし，抽出対象の設定や変数変換などの操作命令は解析者が指示する必要があり，操作命令が複雑なったとき，意図した結果が得られていない可能性が高くなる。例えば，データ抽出における範囲設定やif文によるカテゴリ化の際，”and”，”or”ルールが複雑になればなるほど，所望の解析用データセットが得られていない可能性が高まる。そこで，解析用データセットが解析者の意図したものに一致しているかを機械的に調べる必要がある。 JMPの統計的方法によって，解析用データセットの質を確認することができる。ある変数の最大値や最小値を求める方法は最も簡単なものであるが，「1変数の分布」，「2変数の関係」も強力であり，「2変数の関係」において寄与率1がエビデンスである。本発表では大規模データに対して解析用データセットの質を確かめた事例を報告する。東京理科大学理工学部経営工学科講師。東京大学大学院化学システム工学専攻主幹研究員。研究専門分野は統計的品質管理。主に品質管理に必要な統計解析法について研究しているが，統計的品質管理の防火建築，火災現象，医療や介護への応用も行っており，JMPを用いて大規模データをモデリングして背後に潜む情報を抽出し，研究対象となる固有の分野へフィードバックしている。

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：初級近年、企業経営において日々蓄積されるデータを分析・可視化し、戦略策定や意思決定に役立てるビジネスインテリジェンス（Business Intelligence）やピープルアナリティクス（People Analytics）が注目を集めている。本発表では、従業員や組織に関する調査データをJMPによって解析し、その結果を経営の意思決定のためのコンサルティング提案に活用する事例について報告する。企業の経営コンサルティング活動において、組織の実態を把握するために行われる定量・定性の組織調査は欠かせないものである。従業員一人ひとりの成長によってもたらされる組織の持続的な成長を実現するには、これらの調査データから組織の状態を可視化し、将来の予測や意思決定に活用できることが望ましい。本事例では、A社で取得したデータに対し、JMPの多変量解析機能および解析模型図や構造模型図という可視化ツールを用いた方法論を適用する。その結果、A社の経営層に向け、わかりやすい提案を行うことが可能になる。本発表では、記述統計を用いた一般的な分析から一歩進んだ解析手法について、データ取得から提案までの一連の流れを紹介する。マーケティング関連会社、EAP（Employee Assistance Program：従業員支援プログラム）サービスを提供するプロバイダー、ベンチャー企業勤務を経て、組織人事コンサルタントとして独立。企業の組織・人材開発の業務に携わりながら、社会人大学院生として博士課程に進学し、質問紙調査・質問紙実験に基づく解析と設計をテーマとした研究に取り組む。修了後も引き続き社会科学領域のテーマを中心に企業実務と研究を両輪で実践し、現在は桜美林大学ビジネスマネジメント学群特任講師、NPO法人GEWEL理事、FREELY合同会社代表として理論開発と開発した理論の実務への適用を進めている。http://researchmap.jp/sho-kawasaki/ 高橋武則 50年近くに亘りQM（質経営），SQM（統計的質経営）および設計論の研究を行ってきた．21世紀に入ってからは設計パラダイムである超設計（Hyper Design）を提案し，その数理であるHOPE理論を開発しその支援ソフトHOPE－Add-inをSAS社との共同開発行っている．考え方である超設計，統計数理であるHOPE理論，支援ツールであるHOPE－Add-in for JMPの三位一体で新しい設計法を実現している．そしてこの理論の社会科学的延長線上で多群主成分回帰分析を提案している．橘雅恵社労士事務所を開業以来、人事制度構築に注力。サポート企業は80社以上。各社に最適な制度構築は、社員インタビューや社員アンケートを使った組織風土診断・賃金分析が不可避であると考えている。経営全般をサポートする専門家集団ジャパンコンサルティングファームを設立し、経験や勘だけではなく、データに基づいて因果関係を見つけ出し精度が高い経営課題を抽出し、企業の業績アップ、組織開発を提案できるチームを目指している。

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：中級おそらくJMPは信頼性データ分析のソフトウェアの中では最も強力な能力と体系的なアプリケーションを持っているソフトウェアである。本報告では，JMPの信頼性/生存時間分析のプラットフォームを使って、一変量の分布、二変量の関係、予測、モデル化と許される時間の中で寿命データの分析方法を体系的に紹介する。特にモデル化では再生定理や信頼性試験で使われる方法を実例から飛躍しない程度の仮想的な例を通じて信頼性活動とデータ分析プロセスを紹介する予定である。廣野元久 1984年、株式会社リコー入社。以来、社内の品質マネジメント・信頼性管理の業務、統計学の啓発普及に従事。品質本部QM推進室室長、SF事業センター所長を経て、現職。東京理科大学工学部(1997-1998)、慶應義塾大学総合政策学部　非常勤講師(2000-2004)。主な専門分野は統計的品質管理、信頼性工学。主著に「グラフィカルモデリングの実際」、「JMPによる多変量データの活用術」、「アンスコム的な数値例で学ぶ統計的計算方法23講」、「JMPによる技術者のための多変量解析」、「目からウロコの多変量解析」などがある。遠藤幸一 1987年株式会社東芝に入社。パワーIC(電源用IC、モーター用500V耐圧ドライバIC等)の製品開発・プロセス開発を経て、現在は故障解析技術開発に従事。博士(情報科学) 大阪大学。 ※配布資料はございません。

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：中級 JMP (Pro)を使えばR , Pythonなどに較べて手軽に分析を楽しめます。フルオーダーメイドの分析とはいきませんが、セミオーダーには十分に対応が可能です。JMPを使えば以下のようなことが簡単に実行できます。 ①コマンドを打ちこまなくてもマウス１つで分析が可能に、②グラフと統計量のセット、③分析プロセスをスクリプトに残せる、④分析プロセスの流れに沿ったレポートの出力が可能、⑤統計的な思想が基本にあるから体系的な理解と学習に最適、など。本報告では数値例を使ってJMPでできる予測や分類の話をします。扱う方法はカーネル平滑化、SVMやニューロ判別などです。また、従来の統計的な多変量解析との対比も行い理解を深めます。 1984年、株式会社リコー入社。以来、社内の品質マネジメント・信頼性管理の業務、統計学の啓発普及に従事。品質本部QM推進室室長、SF事業センター所長を経て、現職。東京理科大学工学部(1997-1998)、慶應義塾大学総合政策学部　非常勤講師(2000-2004)。主な専門分野は統計的品質管理、信頼性工学。主著に「グラフィカルモデリングの実際」、「JMPによる多変量データの活用術」、「アンスコム的な数値例で学ぶ統計的計算方法23講」、「JMPによる技術者のための多変量解析」、「目からウロコの多変量解析」などがある。 ※配布資料はございません。

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：中級現状を良くするためのアンケートは結果系と原因系の両方の質問が必要である．そして，両者の因果関係を回帰分析で把握すれば手を打つべき対象の選抜ができる．その場合，聞き逃した項目は調査後にカバーができないが，結果として不必要なものがあった場合は事後にそれを無視すればよい．このため回答者に負担をかけ過ぎないという配慮のもとに漏れのない質問項目を用意すると，その結果とし質問項目は多数となり項目間に高い相関が現れる．この問題に対しては，高橋・川崎により「多群主成分回帰分析」が提案されている．その本質は群内の相関は高く群間の相関は低くなるように構成した合理的な群のもとで群毎に主成分を求め，これを説明変数とした主成分回帰で重要な主成分を選択し，選択された主成分に対する因子負荷量の絶対値の大きなものの対策を打つべき主要項目として選抜するというものである．時にはこの主要項目が密集することがあるが，それは因子分析で対応できる．因果分析は主成分を用いた場合が表側因果分析で，因子を用いた場合が裏側因果分析で，両者を合わせたものが両側因果分析である．本発表ではそのための理論とJMPを用いた具体的な方法を紹介する．著者は半世紀に亘りQM（質経営），SQM（統計的質経営）および設計論の研究と実践（多数の企業との共同研究および経営指導他）を行ってきた．1990年代より新しい設計パラダイムである超設計（Hyper Design）を提案し，その数理であるHOPE理論（Hyper Optimization for Prospective Engineering）を研究し，その支援ソフトであるHOPE－Add-in for JMPをSAS社と共同開発を行っている．以上より考え方である超設計，統計数理であるHOPE理論，支援ツールであるHOPE－Add-in for JMPの三位一体で新しい設計法を実現している．　設計は敷居が高いため特殊な人々による特殊な活動と誤解されることが多い．これを打破し多くの人が設計を使いこなせるようにするために，著者は理論研究とともに新しい教育方法についても開発している．それは実物教材（紙ヘリコプター，紙グライダー，コイン射撃ほか）と仮想教材（飛球シミュレーターほか）を用いた体験型教育である．この教育プログラム（統計の基礎から超設計まで）は多くの場面でJMPによる可視化した解析と設計を行うために，分かり易くかつ面白い教育に仕上がっている．この教育は30年以上に亘り国内外の大学（慶應義塾大学，Yale University，東京理科大学，筑波大学他）および多数の企業で実施しその有効性を確認している． ※さらに詳細な理解を希望される方には、論文をご用意しております。ご希望の方は、発表者である高橋武則様まで直接ご連絡ください。

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：中級働き方改革が叫ばれる昨今、実験と解析の効率化の重要性が高まっている。JMPによる実験効率化の威力を示すには、実験データを決定的スクリーニング計画（DSD）やカスタム計画で置き換えて見せて実験数を大幅に削減できることを示すと良い。その際の応答データは既存実験データから拾い出す。複数の表に分けられた実験データを見つけた時は、一つのテーブルにまとめて多変量解析を行い、プロファイルで可視化して見せて、OFAT(One Factor at a Time)的方法の落し穴に気づいてもらう。繰り返しのある実験データを平均で分析する考え方に対しては、積み重ね処理や平均・分散による多目的最適化やロバスト最適化の方法があることを示す。開発現場で実験計画法を使う場合は交互作用の存在を予測できないことが多く、しかも交互作用は決して稀なことではない。DSDは主効果と交互作用(2FI)の交絡や２FI間の交絡がなく、実験数が因子数の２倍程度の少ない数で済む。これは大きな利点である。実際にDSDを使って分かったこと、主効果数＋交互作用項数が因子数に近づく時に起きる破綻、拡張計画による解決方法、JMPコミュニティやASQから入手したDSD関連論文の中で実務上重要と思われる点、などについて報告する。山武ハネウエル（現Azbil）でFA開発部長，理事研究開発本部長，理事品質保証推進本部長，アズビル金門参与，などを歴任したのち東林コンサルティングを設立．専門領域は生産データ解析による歩留まり改善や品質改善，市場不良予測・ロバスト設計・最適化設計・実験計画などの統計的問題解決全般，デザインレビュ―・根本原因分析手法（RCA）・ヒューマンエラーの未然防止・工程改善などの現場指導など．著書は『ネットビジネスの本質』　日科技連出版　2001（共著）【テレコム社会科学賞受賞】，『実践ベンチャー企業の成功戦略』　中央経済社　2011(共著)，『よくわかる「問題解決」の本』　日刊工業新聞社　2014(単著)．主な論文は「生産ラインのヒヤリハットや違和感に関する気づきの発信・受け止めを促進するワークショップの提案」品質管理学会 2016【2016年度　品質技術賞受賞】．主な講演「作業ミスを誘発する組織要因を可視化し改善を促進する仕組みの提案」(Discovery-Japan 2018) 「JMPによる品質問題の解決～製造業の不良解析と信頼性予測～」(Discovery-Japan 2019)

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：中級特定検診は、生活習慣病予防の観点から、40歳から74歳を対象にメタボリックシンドローム(通称：メタボ)の該当者を減少させることを目的としている。特定検診受診者全体に対する検診結果の要約は、個人と全体を比較するベンチマークとなり得るため、中高年個々人の健康管理に対して参考になると思われる。厚生労働省が提供するNDBオープンデータでは、特定検診の情報として、年度ごとに検査項目（腹囲、血糖値、血圧など）の平均値や階級別分布を入手することができ、メタボの判断基準となるいくつかの検査項目に対し、性別、年代などの属性ごとに基準外の人数、検査人数を求めることができる。属性ごとに各検査項目に対する基準外の割合をグラフ化してみると、検査項目によっては年代による傾向が表れないなど興味深い結果が得られる。本発表ではこれらグラフ化とともに、各検査項目に対する基準外の割合に対し、年度、都道府県、性別、年代を要因とした一般化線形モデルをあてはめた結果を示す。このモデル化により、対象者の属性（性別、年代、居住している都道府県）における基準外の割合を予測することができ、特定検診受診者全体を深く理解できる。 JMPジャパン事業部の技術エンジニア。現在は主に製薬会社、食品会社を対象としたJMP製品のプリセールス業務を行っている。JMPをお客様に紹介する立場ではあるが、自身も一人のJMPユーザであるという意識が強い。近年はメディア等で話題となる事柄について、JMPで分析した結果をブログや分析レポートの共有サイトである「JMP Public」に投稿している。 https://public.jmp.com/users/259

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：初級 4月から実施してきた「分析前後のJMPトリビア」を用いて、新型コロナウィルス感染状況のデータを読み込み、そのデータを加工してグラフにする手順をJMPの上で実演します。元のデータはWebページからJMPに読み込みます。第一波と呼ばれる5月末までのデータを使って、計算式を加えながらデータテーブルを整えて、感染状況の推移を示すグラフを作成します。そのデータテーブルに第二波と呼ばれる6月から9月末までのデータを追加して、同じグラフを作成して、第一波と第二波の状況の違いを考察します。さらに、そのグラフをJMP Public/JMP Liveに発行して、JMPを持っていない人にも共有・参照してもらえるようにします。 1988年豊橋技術科学大学大学院情報工学専攻修士課程終了後、モトローラ/フリースケールセミコンダクタにて不良解析、歩留まり・品質改善、テキサスの研究所を経て、製品立ち上げプロジェクトに参画、上級主任技術者として世界中の工場の問題解決に当たる。2011年仙台工場閉鎖後、豊橋技術科学大学にて特任講師として、結晶成長による刺入型神経電極の開発に従事。2017年現職に就き現在に至る。JMP使用歴は23年。

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

Marcus Soerensen, Head of Quality & Six Sigma, Envases This case study will show how the quality of a process was improved by using statistics as a common language between various departments in the company. What started out as a "confused" project with a very limited set of data turned out to be a successful project as the team began structured and data-oriented progress using JMP. This case illustrates how various departments can work together using data and statistics as the foundation for process improvements. Auto-generated transcript... Speaker Transcript Marcus Welcome to this presentation here, using SAS JMP. The topic for this presentation is how to use statistics as a common language and how to improve process quality by doing so. I'm working as a quality manager for a company called Envases. Envases is a huge company, a global company making cans for the food industry. You can see some of the products here in the in the Left corner. Some of the products may be familiar to you. It's cans you can buy in the supermarkets, we have meat, have fish, we have milk powder, we have also juice and so on in the cans. We have manufacturing facilities in Mexico and in Europe and the topic that I want to discuss here today is related to our production facilities in Denmark. Basically what I'm going to present here is a problem that has been solved using SAS JMP as one of the tools. And I'm going to spend some time in the JMP to to show some of the functionalities and to give an idea of how SAS JMP can can be used to work with these kind of problems, which I'm quite sure is very familiar to many of the people, seeing this presentation. Well, going back to last year, we had some issues in our production lines and the people in the production came to me. I'm the head of quality and I'm dealing with these kind of problems that came to my desk saying, Marcus, we have some some tin dust in our production lines. Tin is the material that we're using for cans, so tin dust as such it's not it's not around things to see in our production, but people were complaining that they could see too much tin dust in our production lines. And, as a result of that, they need to clean the lines all the time. That was, like the beginning of the problem. We didn't have any data to to show this. We didn't have any paper to go through. We didn't have any Excel sheets. We didn't have any JMP files to to look into. We just had some opinions from the people saying we have too much tin dust on our production line, and we need to clean all the time. And to make it even up a bit more confusing, but people were saying we have seen this many times before, and some people were saying that we have never seen this before. I started to speak with the people on the on the line, saying, is this new to you? Has this happened this week or was it last week? Or what about last year? And I got a lot of different opinions and a lot of inputs, and also people were not telling me the same thing, so I was a bit confused about how could we get started with this project. But anyway, we needed to get started. So the first thing I did was to set the team, saying, people here in the organization will need to set a team. Who should be involved in team? And we pinpointed some some people in a in the production, some technical people, some people from the operation, some people for the maintenance department and so on, relevant people for the project. And the structure that we have used here to solve the problem is inspired from the Six Sigma DMAIC, maybe some of you know it already. DMAIC is about defining the problem, measuring the problem, analyzing the reason for the problem, improve and then control, if you actually succeeded with the with the solution that you came up with. One of the idea with this, DMAIC, and one of the idea, also for me, using the structure here is to define and measure the problem, because sometimes you may be thinking you have a problem, but by starting to measure the problem you may realize that what you think was a problem is actually not a problem at all. So there's no need to to continue solving the problem that is actually not existing, and that was what I told people saying, Okay, first of all, let's measure the problem. Do we even have a problem, or is it just a stomach feeling that we are having in the production? That was the first step. So we went to the to the production line ???, going to the line to see, can we have an idea of the process, maybe we can even see where the problem occurs. The process is is is rather simple. I've tried to simplify it here in this very simple drawing, just to give you an idea. We have tin sheets coming in to a conveyor and then we have a piston making the lid for the cans, so the process is rather simple. As a result of this piston and of making these lids, we can see that we can collect some dust on the conveyor, and this is where the operator needs to clean once in a while. And we could see by doing the ??? that after 25,000 lids, we could collect about or more than two kilograms of tin dust. And that was when we needed to clean the line. So that was like our baseline, say when we had produced 25,000 lids equal to more than two kilograms, we need to clean the line, and that was not acceptable because cleaning is taking up production time and having reduced production time we cannot produce the lids that we want to produce. That means that we maybe will be late for the delivery, or maybe we are not able to deliver at all. So we had, all of us, an interest of reducing this cleaning and we could see that we can reduce your cleaning if we can reduce the tin dust produced at the line. So the problem was pretty simple when you when we have collected the the data in kilograms here. Remember that you can see the back here on the picture, this is the dust that we could collect using a vacuum cleaner after 25,000 lids. In the beginning, when people came to me we didn't even have a baggage with the dust, it was just it was just by watching, we could express the problem that we were having. Now at least we could see that this is the problem that we're having. We have too much dust and this is too much dust, because we need to clean it. So we have to reduce the kilograms after 25,000 lids produced, that was the success criteria of the project. By collecting the right people, we took a like a workshop, put all the people in a room saying we have a problem here. After 25,000 lids we produce more than two kilograms of dust. What is the reason for that? And then we did a brainstorming, saying look at the process, looking at what is coming in the process, what is coming out the process. Can any of these variables explain the recent for the tin dust that we see? So step one was to define the variables. And you can see, I have marked here in yellow what the team expected to be some of the root causes for the tin dust that we saw. We started with the input that since it's coming in, we know that the tin sheet, the thickness of the tin sheet can can can vary, so we can have some thicker tin sheet coming in and some not so thin thin sheet coming in. And we also, we could see that the coating of the piston could be a reason for the tin coat. We could see, compared to other lines that had some coat, didn't have the problem in the same scale. We could also see that the measurements of the piston, we have four different measurements on the piston that could explain the reason for the tin dust. We have never tried this out, so this was just on the paper, so this could be the reason but let's try to find out. Last year or two years ago, when we have a problem like this, the approach would normally be that we were trying different things out, so we could try and make the thickness could see if this could change anything, the coat. But here we would like to combine all the variables in one experiment simply to to speed up the process. So we set up a design of experiments, a DOE. Over time we have to change this a bit, so it could reflect the reality that we have and also the allocated production times that we could use for the experiment. Setting up the experiments and defining the variables was not a difficult task. It took maybe a couple of hours. Setting up the DOE was not difficult, we did that in SAS JMP. But executing the DOE was the tricky part because it took a long time; it took about a week. So we need to plan to take out the machine and then we did the trial for about five days. And simply what we did, we produced 25,000 lids using one kind of setting and then 25,000 lids using another kind of setting and so on. And then, after the week, we analyzed the results, and then we concluded based on these results. Let me try to show you what we did in SAS JMP. You can see here this, just by looking at the numbers, we could see that this is a huge progress since our starting point. We started by just having people say we have too much dust and we need to clean all the time. Now at least we have some number a number...numbers on the on the tin. You can see here, we have the tin dust. This is the produced tin dust after 25,000 lids, and we also have different settings of the thickness of the material coming into the line. And we have the four different measurements of the piston here. And we have a statement, has it been coated or has not been coated. So we have different kind of pistons that we were trying out. This is rather easy for people to understand. They could they could see how much tin dust do we have if the thickness of the material coming in, is 5.74, if the measurement is 1.47 and so on. So this is a huge step from coming from just watching to actually have some real numbers behind the working set that we were having in the beginning. So just collecting the number here was a huge progress from our starting point, but the idea was to use the number to see can we explain the reason for the tin dust based on the thickness of the material coming in, the four different measurements of the piston, and if the piston has been coated or not. We're using some of the tools in SAS JMP and one of the tools that we're using a lot here is the fit model. The fit model explains if there will be any relationships between your responses and your variables. And up here we have the response. This is our tin dust. Here we have the model, so we would like to see if the thickness of the material effect the 10 dust, the measurement of the piston, and if the coating of piston would have any impact. Running the model here and saying, try to combine the different variables and tell me what will have the highest impact on our tin dust. And this is basically the results that we got out of it. We could see here we have what we're using the p value for us to guide us if this makes sense for us. The coat is low and the p value meaning, well, it seems like we have a significant relationship between the coat and our tin dust. We also believe that the thickness of the material will have a relationship with our tin dust, and we believe that the Measurement 3 will have some kind of relationship with the tin dust. This mean also that the Measurement 1, 2 and 4 don't seem to be significant when we talk about the tin dust. Remember that we were starting from just working without any kind of number, so now we were talking about P values and how this can help us. And this was actually quite easy for us to to interpret it, and people did understand, okay significant means that this maybe is not a coincidence. And by using the right people, we could verify this makes sense for them as well, so it seems like the coating can have an impact on the tin dust. And the technic... technical staff were saying, yeah, it makes sense that the coating will impact on the tin dust, because we have seen this on other lines. And the thickness could be verified makes sense and the Measurement 3 could make sense, so we started to believe that this is some good guidelines for us, but we need to see yeah, we can see that the coat seems to be significant, but is it with or without coating that is relevant for us? So we expanded this... the fit model here and you can see here in the in the profiler how the different variables will impact the tin dust. You can see the tin dust here to the left. And you can see the thickness of the sheet coming in the process here, the Measurement 3 and then the coat. And then we could try to simulate if we have to coating, if we have Measurement 3 on. what would be the expected kilograms coming out of the process? Here it's saying we can expect 0.0057. We also have a confidence interval here, but we can expect that this will be the, the number of tin dust coming out of the process. Then look and see what if we have a piston that is not coated. Can see that will change significance, and it will be higher. And we know from our started that around 0.2 will be like the game changer if we have more than 0.2 kilograms in tin dust, we need to clean, so we want to be lower than 0.2. And what we did to go even further here, because we know that the thickness of the material was very difficult for us to control, this is specified and there will be some variation within the thickness, which would be very difficult for us to change. So we needed to have a very robust process, saying we need to keep the thickness flexible. But what we can control is the Measurement 3 and the coating. So we expanded this profiler to the simulator so that we could simulate what if the thickness will change with some specified standard deviation. So we're saying we know the thickness can can change. We know that we have a mean around 5.595 and we have a standard deviation of this material equal to 0.058. We could change this later on. We want to fix the measurement and we want to fix the coating, having the coating on the piston. We also know that our target is not higher than 0.2 so we could add a target in here. And then we could simulate. The Measurement 3 will be fixed, the coat will be fixed, but the thickness will change over time. Then we can simulate if we want 5,000 sheets in the process, what could we expect to see in the tin dust? And then we could simulate. You can see if we have a tin sheet coming in having a thickness of about 5.6, the Measurement 3 at 1 and with a coat, we could expect a very good result. And you can see, we have the red line here at 2... sorry, 0.2, and we can also see hfere that the rate of defect, meaning that rate of measurements higher than 0.1 would be 0, so this is good for us. You can also see that if we then change it, the Measurement 3, not at 1, but to 2.5. And with the simulation again. We will be at a slightly higher tin dust amount. If we, on top of that, sorry, change it to a piston with no coating and run 5,000 sheet plate, we could expect a very poor response. And we could see that the setup that we had in our production line before we started these changes were pistons without any coat. So this was very new to us, and it was very exciting for us to see that we can actually see what could control the the tin dust and we can even control it ourselves. So, based on the simulations that we have here, we decided to have a Measurement 3 on 2 and then a piston with a coating, meaning that we will have a very robust process that gets...that can handle the variation that we have in our thickness of the sheets. Then we could also try to to to simulate thing, what if this deviation will not be 0.058 but will be 1.5. You see, this will be too much, so this is some of the agreements that we have with our suppliers that they need to keep the standard deviation at a specific level because then we will have a robust process. So this was like a an eye opener for all of us since this is very, very good picture for us, and it hasn't cost us a lot. The only thing it has cost us was the experiment that we set for five days. So this was good news for us. And so here on number five, we concluded saying, Okay, now we know what should be improved, we need to coat the tools or the piston and we need to adjust the measurement to 2.0. It was coming from about 3.0. So this was ??? an improvement, and this is a very simple picture from the profiler, showing the relationship between our tin coat and the different variables that we expected. And we did a control saying let's try to change our processes and then produce 25,000 more lids and then you can see the comparison. This is was our like our baseline. You can see, the small dust here in the baggage before we change the process, and you can see the 25,000 of the tin dust after producing 25,000 lids after the change and we didn't see any tin dust here. So this was like...it was very good proof of concept for us. It was one of the first projects that we did using SAS JMP and it was a very good proof of concept and people did really rely on this way of doing problem solving. What we learn here is the use of data, it shows benefit for us, because then we have something in common, something that we can relate to everybody, instead of having different opinions that is very difficult for us to quantify, so the use of this was very helpful for all of us. And using the right people, the technical people, people with a knowledge from from the line, also people who can use like a statistical software as JMP understand how to set up an experiment, understand how to do a fit model, regression models and so on. And then the use of SAS JMP is truly powerful. We could not have done this, like in Excel because they don't have the tools. And then, using a structured process like they make us very powerful for us, so this was a very good learning for us, and this is something that we have implemented in many projects afterwards with very good response.

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

Vince Faller, Chief Software Engineer, Predictum Wayne Levin, President, Predictum This session will be of interest to users who work with JMP Scripting Language (JSL). Software engineers at Predictum use a continuous integration/continuous delivery (CI/CD) pipeline to manage their workflow in developing analytical applications that use JSL. The CI/CD pipeline extends the use of Hamcrest to perform hundreds of automated tests concurrently on multiple levels, which factor in different types of operating systems, software versions and other interoperability requirements. In this presentation, Vince will demonstrate the key components of Predictum’s DevOps environment and how they extend Hamcrest’s automated testing capabilities for continuous improvement in developing robust, reliable and sustainable applications that use JSL: Visual Studio Code with JSL extension – a single code editor to edit and run JSL commands and scripts in addition to other programming languages. GitLab – a management hub for code repositories, project management, and automation for testing and deployment. Continuous integration/continuous delivery (CI/CD) pipeline – a workflow for managing hundreds of automated tests using Hamcrest that are conducted on multiple operating systems, software versions and other interoperability requirements. Predictum System Framework (PSF) 2.0 – our library of functions used by all client projects, including custom platforms, integration with GitLab and CI/CD pipeline, helper functions, and JSL workarounds. Auto-generated transcript... Speaker Transcript Wayne Levin Welcome to our session here on extending Hamcrest automated testing of JSL applications for continuous improvement. What we're going to show you here, our promise to you, is we're going to show you how you too can build a productive cost-effective high quality assurance, highly reliable and supportable JMP-based mission-critical integrated analytical systems. Yeah that's a lot to say but that's that's what we're doing in this in this environment. We're quite pleased with it. We're really honored to be able to share it with you. So here's the agenda we'll just follow here. A little introduction, my self, I'll do that in a moment, and just a little bit about Predictum, because you may not know too much about us, our background, background of our JSL development, infrastructure, a little bit of history involved with that. And then the results of the changes that we've been putting in place that we're here to share with you. Then we're going to do a demonstration and talk about what's next, what we have planned for going forward, and then we'll open it up, finally, for any questions that that you may have. So I'm Wayne Levin, so that's me over here on the right. I'm the president of Predictum and I'm joined with Vince Faller. Vince is our chief software engineer who's been leading this very important initiative. So just a little bit about us, right there. We're a JMP partner. We launched in 1992, so 29 years old. We do training in statistical methods and so on, using JMP, consulting in those areas and we spend an awful lot of time building and deploying integrated analytical applications and systems, hence why this effort was very important to us. We first delivered JMP application with JMP 4.0 in the year 2000, yeah, indeed over 20 years ago, and we've been building larger systems. Of course, since back then, it was too small little tools, but we started, I think, around JMP 8 or 9 building larger systems. So we've got quite a bit of history on this, over 10 years easily. So just a little bit of background...until about the second half of 2019, our development environment was really disparate, it was piecemeal. Project management was there, but again, everything was kind of broken up. We had different applications for version control and for managing time, you know, our developer time, and so on, and just project management generally. Developers were easily spending, and we'll talk about this, about half their time just doing routine mechanical things, like encrypting and packaging JMP add-ins. You know, maintaining configuration packages and, you know, and separating the repositories or what we generally call repo's, you know, for encrypted and unencrypted script. It was...there was a lot we hade to think about that wasn't really development work. It was really work that developer talent is...was wasted on. We also had, like I said, we've been doing it a long time, even at 2019, we had easily 10 years, so over 10 years of legacy framework going all the way back even to JMP 5, you know, with, you know, it was getting bloated and slow. And we know JMP has come a long way over the years. I mean in JMP 9, we got namespaces and JMP 14 introduced classes and that's when Hamcrest began. And it was Hamcrest that really allowed us to go this this...with this major initiative. So we began this major initiative back in August of 2019. And that's when we are acquired our first Gitlab licenses and that's the development of our new...the development of our new development architecture, there you go, started to take shape and it's been improving ever since. Every month, basically, we've been adding and building on our capabilities to become more and more productive, as we go forward. And and that's continuing, so we actually consider this, if you will, a Lean type of effort. It really does follow Lean principles and it's accelerated our development. We have automated testing, thanks to this system, and Vince is going to show us that. And we have this little model here, test early and test often And that's what we do. It supports reusing code and we've redeveloped our Predictum system framework. It's now 2.0. We've learned a lot from our earlier effort. All that's gone, pretty much all of its gone, and it's been replaced and expanded. And Vince will tell us more about that. Easily, easily we have over 50% increase in productivity, and I'm just going to say the developers are much happier. They're less frustrated. They're more focused on their work, I mean the real work that developers should be doing, not the tedious sort of stuff. There's still room for improvement, I'm going to say, so we're not done and Vince will tell us more about that. We have development standards now, so we have style guides for functions and all of our development is functionally based, you might say. Each function requires at least one Hamcrest test, and there are code reviews that the developers, they're sharing with one another to ensure that we're following our standards. And it raises questions about how to enhance those standards, make them better. We also have these, sort of, fun sessions, where developers are encouraged to break code, right, so they're called like, these break code challenges, or what have you. So it's become part of our modus operandi and it all fits right in with this development environment. It leads to, for example, further tests, further Hamcrest tests to be added. We have one small, fairly small project that we did just over a year ago. We're going into a new phase of it. It's got well over... well over 100 Hamcrest tests are built into it and they get run over and over and over again through the development process. So some other benefits is it allows us to assign and track our resource allocation, like what developers are doing what. Everyone knows what everyone else is doing, continuous integration, continuous deployment, something like that), there's...code collisions are detected early so if we have... and we do, we have multiple people working on some projects, so, you know, somebody's changing a function over here and it's going to collide with something that someone else is doing. We're going to find out much sooner. It also allows us to improve supportability across multiple staff. We can't have code dependent on a particular developer; we have to have code that any developer or support staff can support ging forward. So that's was an important objective of ours as well. And it does advance the whole quality assurance area just generally, including supporting, you know, FDA requirements, concerning QA, you know, things like validation, the IQ OQ PQ. So it's...we're automating or semi automating those tasks as well through this infrastructure. We do use it internally and externally, so you may know, we have some products out there, (???)Kobe sash lab but new ones spam well Kobe send spam(???) are talked about also elsewhere in the JMP Discovery European Conference in 2021. You might want to go check them out, but they're fairly large code bases and they're all developed, in other words, we eat our own dog food, if you know that expression, but we also use it with all of our client development, so this is something that's important to our clients, so because we're building applications that they're going to be dependent on. And so we, we need to...we need to have the infrastructure that allows us to be dependable, and anyway, that's a big part of this. I mentioned the Predictum system framework. You can see some snippets of it here. It's right within the scripting index, and you know, we see the arguments and the examples and all that. We built all that in and 95%, over 95% of them have Hamcrest tests associated with them. Of course, our goal is to make sure that all of them do and we're we're getting there. We're getting there. Have...these framework...this framework is actually part of our infrastructure here. That's one of the important elements of it. Another is just that...Hamcrest... the ability to do the unit testing. And I'm going to have...there's a slide at the...at the end, which will give you a link into the Community where you can learn more about Hamcrest. This is a development that was brought to us by by JMP, back in JMP 14, as I mentioned a few minutes ago. Gitlab is a big part of this; that gives us the project management repository, the CI/CD pipeline, etc. And also there's a visual...visual studio code extension for JSL that we created and we'd...you see five stars there because it was given five stars on the on the visual studio. I'm not sure what we call that. Vince, maybe you can tell us, the store, what have you. It's been downloaded hundreds of times and we've been updating it regularly. So that's something you can go and look for as well. I think we have a link for that as well in the resource slide at the end. So what I'm going to do now is I'm going to pass this over to Vince Faller. Vince is, again, our chief software engineer. Vince led this initiative, starting in August 2019, as I said. It was a lot of hard work and the hard work continues. We're all, in the company, very grateful for Vince and his leadership here. So with that said, Vince, why don't you take it from here? I'm gonna... I'm... Vince Faller Sharing. So Wayne said Hamcrest a bunch of times. For people that don't know what Hamcrest is, it is an add-in created by JMP. Justin Chilton and Evan McCorkle were leading it. It's just a unit testing library that lets you run, test, and get results of it in an automated way. It really started the ball rolling of us being able to even do this, hence why it's called extending. I'm going to be showing some stuff with my screen. I work pretty much exclusively in the VSCode extension that we built. This is VSCode. We do this because it has a lot of built-in functionality or extendable functionality that we don't have to write, like get integration, get lab integration. Here you can see this is a JSL script and it reads it just fine. If you want to get it, if you're...if you're familiar with VSCode, it's just a lightweight text editor. You just type in JMP and you'll see it. It's the only one. But we'll go to what we're doing. So. For any code change we make, there is a pipeline run. We'll just kind of show what it does. So if I change the README file to this is a demo for Discovery. 2021. And I'm just going to commit that. If you don't know get, committing as just saying I want a snapshot of exactly where we are at the moment, and then you push it to the repo and it saved on the server. Happy day. Commit message. More readme info. And I can just do get push, because VSCode is awesome. Pipeline demo. So now I've pushed it. There is going to be a pipeline running. I can just go down here and click this and it will give me my merge request. So now pipeline started running. I can check the status of the pipeline. What it's doing right now is it's going to go through and check that it has the required Hamcrest files. We have some requirements that we enforce so that it can... we can make sure that we're doing our jobs well. And then it's done. I'm going to press encrypt. Now encrypt is going to take the whole package and encrypt it. If we go over here, this is just a vm somewhere. It should start running in a second. So it's just going through all the code. Writing all the encrypted passwords, going through, clicking all that stuff. If you've ever tried to encrypt multiple scripts at the same time, you'll probably know that that's a pain, so we automated it so that we don't have to do this because, as Wayne said, it was taking a lot of our time to do these. Like, if we have 100 scripts to go through and encrypt every single one of them every time we want to do any release, it was awful. Because we have to have our code encrypted because, yeah sorry, opinion, all right, I can stop sharing that. Ah. So that's gonna run. It should finish pretty soon. Then it will go through and stage it and then the staging basically takes all of the sources of information we want, as our as in our documentation, as in anything else we've written, and it renders them into the form that we want in the add-in, because much like the rest of github, gitlab, most of our documentation is written in markdown and then we render it into whatever. I don't need to show the rest of this but yeah. So it's passing. It's going to go. We'll go back to VSCode. So. If we were to change, so this is just a single function. If I go in here like, if I were to run this... JSL, run current selection. So. You can see that it came back...all that it's trying to do is open Big Class, run a fit line, and get the equation. It's returning the equation. And you can actually see it ran over here as well. But. So this could use some more documentation. And we're like, oh, we don't actually want this data table open. But let's let's just run this real quick. And say, no. This isn't a good return, it turns the equation in all caps apparently. So if I stage that. Better documentation. Push. Again back to here. So, again it's pushing. This is another pipeline. It's just running a bunch of power shell scripts in order, depending on however we set it up. But you'll notice this pipeline has more stages. So when we in an effort to help be able to scale this, we only test the JSL minimally at first, and then, as it passes, we allow to test further. And we only tested if there are JSL files that have changed. But we can go through this. It will run and it will tell us where it is in the the testing, just in case the testing freezes. You know, if you have a modal dialog box that just won't close, obviously JMP isn't going to keep doing anything after that. But you can see, it did a bunch of stuff, yeah, awesome. I'm done. Exciting. Refresh that. Get a little green checkmark. And we could go, okay, run everything now. It would go through, test everything, then encrypt it, then test the encrypted, basically the actual thing that we're going to make the add-in of, and then stage it again, package it for us, create the actual add-in that we would give to a customer. I'm not going to do that real quick because it takes a minute. But let's say we go in here and we're, like, oh, well, I really want to close this data table. I don't know why I commented out in the first place. I don't think it should be open, because I'm not using it anymore, we don't...we don't want that. We'll say okay. Close the dt. Again push. Now, this could all be done manually on my computer with Hamcrest. But you know, sometimes a developer will push stuff and not run all of their Hamcrest for everything on their computer, and this is...the entire purpose of it is to catch it. It forced us to do our jobs a little better. And yeah. Keep clicking this button. I could just open that, but it's fine. So now you'll see it's running the pipeline again. Go to the pipeline. And I'm just going to keep saying this for repetition. We're just going through, testing, and encrypting, then testing because sometimes encryption enters its own world of problems, if anybody's ever done encrypting. Run, run, run, run, run. And then, oh, we got a throw. Would you look at that? I'm not trying to be deadpan, but you know. So if we were to mark this as ready and say, yeah we're done, we'd see, oh, well, that test didn't pass. Now we could download why the test didn't pass in the artifacts. And this will open a J unit file that I'm just going to pull out here. It will also render it in getlab, which might be easier, but for now we'll just do this. Eventually. Minimize everything. Now come on. So, we could see that something happened with R squared and it failed. Inside of blue. So we can come here. Say, why is there something in boo that is causing this to fail? We see, oh, somebody called our equation and then they just assumed that the data table was there. So because something I changed broke somebody else's code, as if that would ever happen. So we're having that problem. Where did you go? Here we go. So that's the main purpose of everything we're doing here, is to be able to catch the fact that I changed something and I broke somebody else's stuff. So I could go through. Look at what boo does. Say, oh well, maybe I should just open Big Class myself. Yeah, cool. Well, if I save that, I should probably make it better. Open Big Class myself. I'll stage that. Open Big Class.Get push. And again, just show the old pipeline. Now this should take not...not too long. So we're going to go in here. We're...we only test on one...to... JMP version, but you can see automatically, we only test on one. Then it waits for the developer to say, yeah, I'm done and everything looks good. Continue. We do that for resource reasons, because these are running on vms that are automatically just chugging all the time, and we have multiple developers, who are all using these systems. We're also... You can see, this one is actually a docker system, we're containerizing these. Well, we're in the process of containerizing these. We have them working, but we don't have all the versions yet. But we run 14.3, at least for this project, we run 14.3, 15, 15.1, and that should work. Let's just revert things. Because that you know works. Probably should have done a classic...but it's fine. So yeah. We're going to test. I feel like I keep saying this over and over. We're going to test everything. We'll actually let this one run to show you kind of the end result of what we get. It should only take. a little bit. And so we'll test this, make sure it's going, and you can see the logs. We're getting decent information out of what is happening, on where it is, like it'll tell you the runner that is running. I'm only running on Windows right now. Again, this is a demo and all that but we should be able to run more. While that's running, I'll just talk about VSCode some more. In VSCode, we also...there's also snippets and things, so if you want to make a function, it will create you all over the the function information. We use natural docs again, that was stolen from the Hamcrest team, as our development documentation. So it'll just put everything in a natural docs form. So it just, again, the idea is helping us do our jobs and forcing us to do our jobs a little better, with a little more gusto. Wayne Levin For the documentation? Vince Faller So that's for the documentation, yeah. Wayne Levin As we're developing, we're documenting at the same time. Vince Faller Yep. Absolutely. You know, it also has for loops, while loops. For with an associate row, stuff like that. Are we...is this...is this done yet? It's probably done, yep. So we get our Green checkmark. Now it's going to run on all of the systems. If we can go back to here, you'll just see it. Open JMP. It'll run some tests, probably will open Big Class. Then close all...close itself all down. Wayne Levin So we're doing this, largely because many of our clients have different versions of JMP deployed and they want a particular add-in but they're running it, they have, you know, just different versions out there in the field. We also test against the early adopter versions of JMP, which is a service to JMP because we report bugs. But also for the clients, it's helpful because then they know that they can...they can upgrade to the new version of JMP. They know that the applications that we built for them have been tested. And that's just routine for us. Good. Vince Faller You're done. You're done. You're done. Change to... I can talk about... And this is just going to run, we can movie magic this if you want to, Meg. Just to make it run faster. Basically, I just want to get to staging but it takes a second. Is there anything else you have to say, Wayne, about it? Cool. I'll put that... Something I can say, when we're staging, we also have our documentation in mk docs. So it'll actually run the mk doc's version, render it, put the help into the help files, and basically be able to create a release for us, so that we don't have to deal with it. Because creating releases is just a lot of effort. Encrypting. It's almost done. Probably should just have had one pre loaded. Live demos, what are you gonna do. Some. Run. Oh, one thing I definitely want to do. So, the last thing that the pipeline actually does is checks that we actually spent our time, because, you know, if we don't actually record our time spent, we don't get paid, so forcing us to do it. Great, great time. Vinde Faller So Vince Faller the job would have failed without that. I can just show some jobs. Trying. That's the docker one. We don't want that. So You can see that gave us our successes. No failures. No unexpected throws. That's all stuff from Hamcrest. Come on. One more. Okay got to staging. One thing that it does it creates the repositories. It creates them fresh every time. So it's like, it tries to keep it in a sort of stateless way. Okay, we can download the artifacts now. And now we should have this pipeline demo. I really wish it would have just went there. What. Why is Internet Explorer up? So now you'll see pipeline demo is a JMP add-in. If we unzip it. If you didn't know, a JMP add-in is just a zip file. If we look at that now, you can see that has all of our scripts in it, it has our foo, it has our bar. If we open those, open, you can see it's an encrypted file. So this is basically what we would be able to give to the customer and not have so much mechanical work. Wayne mentioned that it's less frustrated developers, and personally, I think that's an understatement, because doing this over and over was very frustrating before we got this in place, and this has helped a bunch. That. Wayne Levin Now, about the encryption, when you're delivering an add-in for use by users within a company, you typically don't want, for security reasons and so on, you don't want them to anyone to be able to go in and deal with the code. You know, that sort of thing, so we may deliver a code unencrypted just for, you know, so the client has their code on encrypted, but for delivery to the end user, you typically want everything encrypted, just so it can't be tampered with. Just one of those sort of things. Vince Faller Yep, and that is the end of my demo. Wayne, if you want to take it back for the wrap up. Wayne Levin Yeah, terrific. Sure, thanks, very much for that, Vince. So there's a lot of moving parts in this whole system so it's, you know, basically, making sure that we've got, you know, code being developed by multiple users that are not colliding with one another. We're building in the documentation at the same time. And actually, the documentation gets deployed with the application and we don't have to weave that in. It's... We set the infrastructure up so that it's automatically taken care of. We can update that, along with the code comprehensively, simultaneously, if you will. The Hamcrest tests that are going on, each one of those functions that are written, there are expected results, if you will. So they get compared and so we saw, briefly, there was, I guess, some problem with that equation there. An R square or whatever came back with a different value, so it broke, in other words, to say hey, something's not right here; I was expecting this output from the function for a use case. So that's one of the things that we get from clients, so you know, we build up a pool of use cases that get turned into Hamcrest tests and away we go. There are some other slides here that are available to you, like, when you, when you, if you go and download the slides. So I'll leave that available for you and here's a little picture of of the pipeline that that we're employing and a little bit about code review activity for developers too. If you want to to go back and forth with it. Vince, do you want to add anything here about how code review and approval takes place? Vince Faller Yeah, so inside of the merge request it will have the JSL code on the diffs of the code. And again, a big thank you to the people who did Hamcrest, as well, because they also started a lexer(?) for GitHub and GitLab to be able to read JSL, so actually this is inside of getlab. And they can also read the JSL. It doesn't execute it, but it has nice formatting. It's not all just white text, it's it's beautiful there. We will just go in, like in this screenshot, you click a line, you put in a comment that you want, and it becomes a reviewable task. So we try to do as much inside of GitLab as we can for transparency reasons, and once everything is closed out, you can say yep, my merge request is ready to go. Let's put it into the master branch, main branch. Wayne Levin Awesom. So you're really it's...it's helping, you know, we're really defining coding standards, if you will, and I don't like the word enforcement but that's what it what it amounts to. And it reduces variation. It makes it easier for multiple developers, if you will, to understand what what others have done. And as we bring new developers on board, they come to understand the standard and they know what to look for, they know what to do. So it...it makes onboarding a lot easier, and again it deals with...everything's attached to everything here, so you know supportability and so on. This is the slide I mentioned earlier, just for some resources so we're using GitLab. I suppose the same principles applied to any git generally so like GitHub or what have you. Here's the community link for Hamcrest. There was a talk at in tucson, that was in 2019, in the old days when we used to travel and get together. That was a lot of fun. And here's the the marketplace link for the visuals...visual code studio. Visual studio code, what have you. So as Vince said, yeah we make a lot of use of that editor, as opposed to using the built-in JMP editor just because it's all integrated. It's... it's just all part of one big application development environment. And with that, Vince and I, on behalf of Vince and myself, I want to thank you for your your interest in this, and want to, again we really want to thank the JMP team. Justin Chilton and company, I'll call out to you. If not for Hamcrest, we would not be on this. That was the missing piece or was the enabling piece that really allowed us to to take JSL development to, basically, the kinds of standards you expect in code development, generally, in industry. So we're really grateful for it, and I know our... you know, that that is propagated out with each application we've deployed. And at this point, Vince and I are happy to take any questions that... info@predictum.com and it'll get forwarded to us and we'll get back to you. But at this point, we'll open it up to Q&A.

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

Yassir EL HOUSNI, R&D Engineer/Data Scientist, Saint Gobain Research Provence Mickael Boinet, Smart Manufacturing Group Leader, Saint-Gobain Research Provence Working on data projects across different departments such as R&D, production, quality and maintenance requires taking a step-by-step approach to the pursuit of progress. For this reason, a protocol based on Knowledge Discovery in Databases (KDD) methodology and Six Sigma philosophy was implemented. A real case study of this protocol using JMP as a supporting tool is presented in this paper. The following steps were used: data preprocessing, data transformation and data mining. The goal of this case study was to evolve the technical yield of a specific product through statistical analysis. Due to the complexity of the process (multi-physics phenomena: chemical, electrical, thermal and time), this approach has been combined with physical interpretations. In this paper, the data aggregation (coming from more than 100 sensors) will be explained. In order to explain the yield, decision tree learning was used as the predictive modelling approach. Decision tree learning is a method commonly used in data mining. The goal is to create a model that predicts the value of a target variable based on several input variables. In our case, a model based on three input variables was used to predict the yield. Auto-generated transcript... Speaker Transcript YASSIR EL HOUSNI Hello. I am Yassir El Housni, R&D engineer and data scientist, in the smart manufacturing team of Saint-Gobain Research Provence. in Cavaillon, France. We are working for ceramic materials business units. In this post, we have two points. In the first we will present the data projects life cycle that we propose for manufacturers data projects. And in the second, we will continue to present two user cases of Saint-Gobain materials...ceramic materials industries. Working on data projects across different departments, such as R&D, production, quality and maintenance, require taking a step by step approach to pursue the progress. knowledge discovery in database methodology, and the six Sigma philosophy, also known as DMAIC to define, measure, analyze, improve, and control. We define in this infinity loop seven steps to pursue in order to manage correctly data analysis project. In all of them, we ensure to have the good understanding of the process, because we believe it's a relevance key to sucessful data projects in an industrial world. For example, to detect the variation in the process we use the SIPOC or flow chart map and to detect the causes of variation we use our toolbox of problem resolving which contain a ??? or Ishikawa diagram. Then infinity loop presents also another route to achieve the continuous improvement. In the next we will detail the approach, step by step. Let's start with define the projects. It's necessary to define clearly three elements before starting data projects. We propose here some questions which we found very useful to define clearly the elements of the trade(?). First of all business need definition. Frequently, the targets of data project in manufacturing is to optimize a process, maximize a yield, improve the quality for a specific product or reduce the consumption of energy. Under the definition of the business opportunity, we should know what will be used, is the target need just a visualization or analytics. And after that, the impact should bring our quantified gain to the business. Secondly, data availability and the usability. In it we launch a diagonal analysis of quality of data. It is so important in the step to determine the feasibility of the data project. And therefore the team setup, a person from the data team, a person from the business unit team and a person from the plants, a process engineer with Six Sigma green or black belts. Let's move to the second step. Data preparation. With the transformation, integration, cleansing, it's an important step which consumes a lot of time in data projects. For example, we have here different sources of data and we need to centralize them in one table. Mainly we use X for inputs and Y for outputs. In this setup we use a different tools in JMP such as missing value processing, the ??? of constant variables, and of course the JMP data tables tools which ensures the right SQL request to transform correctly tables. The third step is about exploring data with dynamic visualization and with JMP we have a large choice for visualization. For example, plot the distribution of variable and estimate the load(?) that it's follow. Detect the outliers with the box plots diagram, nonlinear regression between two variables, contour or the density mapping to determine the principal placement for concentration of each population, and we have a large choice to plot it. ???, we use them usually in our work and we found them very useful. The fourth step is to develop the development of the model and it depends on the kind of analysis that we need. It's to... the target is to explain or is to predict. The first is about links between variables and it serves to explain patterns in data. The second is about formula and it serves to predict patterns in data. Generally we cut our data sets in three blocks, 70% for training, 20% for testing, and 10% for validation. And sometime if we have a small size of data we use 70% for training and 30% for validation. If the model is good, we request a new set of data in order to drive decision making. We have to approach supervised and unsupervised learning. Today at Saint-Gobain we'll use the normal version of JMP. And we have access to the supervised learning tools, such as linear regression, decision tree and the neural network. We work a lot today with decision tree because it gives us a relevant results, which help us to resolve a challenge in ceramic materials industry. The fifth step is about finding the optimal solution. Sometimes it's just a solution, but in other cases it's a combination between a lot of models. And to ensure the good sense, we added some constraints, for example, the mean max and the step of valuation of each variable. JMP profile give us large possibility to optimize quickly solutions. From now, the next step is pass to the plant, by the support of our process engineer with the Six Sigma green or black belt. The sixth step is about implementing the best solution in the plants and governed by only one representative model. For example, we implement the control charts of output Y1 and analyze the different variation. In the seventh step, we monitor the model effectiveness and we visualize the global gain of working on our data project. For example in the pie charts, we see the radial impact to the global yield. And the last but not least, the preparation to the step one for a new data project to ensure the continuity, the continuous improvement and the continuity of the infinity loop. That was all about data project lifecycle and now we will pass to present two real case studies of the protocol using JMP. In the two examples we studied the same process technology. It's about electric arc furnace, but for the two we have different products and different target. In this technology, we have a complex multi-physical phenomena such as electrical, thermal, chemical, time and others physics effects. Here, we have, for example, more than 100 process variables that comes from different kinds of sensors. The business need was to explain the global yield of a specific product, JO7. In it we detect a lot of kind of defects, the Defect #1, #2 to #N. And for that to prepare correctly the data sets, we used Pareto, outlier processing with Mahalanobis distance and recoding the correct attributes to correct the type of errors and missing value processing. In step three, in explore, we present here just an example of correlation that we study between inputs to reduce number of variation after working with models. As our results, we found our decision tree with just three variables and here, the goal was to explain why the yield is not in the max. And we have a decision tree with just three variables under 100. So for the model we use here 70% of the data for training because we don't have a big size of data and 30% for validation and we get good results with important root square. As you see, it's more than 70%. So the message that we passed to the plant is that just with the specific setting of X1, X2, and X3 we can explain the global yield, and if we need to maximize Y, the precent of this yield, we need to get a specific setting just for X1 and X3. And the global yield should evolve rapidly. It's the point...at the cluster of points here. And for each project, we give also the physical understanding of each parameter to the plant. For the second example here with the same technology but for another product, here we have just 80 process variables and the target was to evolve the number of pieces with no defect, D1. The need is about explaining the quality of our specific products, so we use the same methodology for our steps. For example here, we studied the same correlation between inputs to reduce the number of variables that we will put in the model. And, as a result, we use also a decision tree, but here we found 12 variables that explain this global yield with good results. As you see the R square was 84%, the RMSE was 3% and number of size was 287. For that, we used the Cross validation method because we have a very small size of table. So the first parameter was very important, as you see, it contributes 50%. And it was difficult to explain that with the 12 variables to the plants. For that, when we plot just the first variable, we see visually that we can define a threshold just with the variable X1, and with it the global yield should evolve rapidly. Thank you.

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

Arhan Surapaneni, Student, Stanford OHS Siddhant Karmali, Student, Stanford OHS Saloni Patel, Student, Stanford OHS Mason Chen, Student, Stanford OHS Our projects include topics ranging from high level analysis of gambling utilizing hypothesis testing tools, probabilistic calculations and monte-carlo simulation (with Java vs. Python programming) to strategic leadership development through quantification of troop strength in the Empire: Four Kingdoms video game. These projects carefully consider decision-making scenarios and the behaviors that drive them, which are fundamental to domains of cognitive psychology and consciousness. The tools and strategies used in these projects can facilitate the creation of user-interfaces that incorporate statistics and psychology for more informative user decision-making – for example, in minimizing players’ risk of compulsive gambling disorder. The projects are about the game of poker and use eigenvector plots, probability and neural network-esque Monte Carlo Simulations to model gambling disorders through a game consisting of AKQJ cards. Offering a subtle analytical approach to gambling, the economic drawbacks are explained through multi-step realistic statistical modeling methods. Auto-generated transcript... Speaker Transcript Siddhant Karmali Hi everyone, this is Siddhant Karmali, Mason Chen and Arhan Surapaneni and we're working on optimizing the AKQJ game for real poker situations. So COVID-19 as effective mental health and can worsen existing mental health problems. The stressors involved in the pandemic, namely fear of disease or losing loved ones, may impact people's decision making ability and can lead them to addictive behaviors. And addiction to gambling is one such behavior that has increased due to an increase in online gambling sites...the site traffic of online gambling sites. This project analyzes how different situations in the game of poker affect how people make irrational decisions. These include situations that may lead to problem gambling. We developed a simplified model of poker that only uses the ace and the face cards, so A, K, Q, and J. That increase the probabilities of certain winning hands called and we called it AKQJ for ace, king, queen, jack. The variables in this model are the card value, the number of players, the number of betting rounds, and whether cards are open or hidden. Open cards or yeah...the objective of this model is to simplify the complicated probability calculations for the winning outcomes in a full game of poker. And we will extend this objective to the idea that since poker in real life has more than one betting round, we can prove or we can show that this model is effective, even in different variations of poker with different betting rounds. So this is how the project...or this is the outline of the project. First we researched emotional betting and compulsive gambling and what are the risk factors for compulsive gambling? How do people like...what do compulsive gamblers think like? Why do they gamble? So we found that people will gamble for thrills, just like addict...people who have addictions to drugs, they use the drug for a high or thrill. So again we... we we infer that gambling or gambling as an addiction must hit the same chords in the brain that are involved in the review...or the that are involved in the reward system. And then we went to...our technology was using hidden and open cards in real cases. So hidden and open cards are...so open cards are the cards that a player keeps face up and the hidden cards are face down, and only the player knows its identity. The and then we made two separate algorithms. There was a comprehensive algorithm and a worst case algorithm. But comprehensive algorithm is more complicated since all the cards are hidden and it's hard to do calculations, and the worst case algorithm had some open cards so players...or our modeled players could infer whether you take the bet or not. And so this was our engineering part. We used JMP to model players play styles. And we also used Java and Python programming, as Arhan will show, to generate...to randomly generate card situations, and we calculated the probabilities and conducted correlation and regression tests in JMP. So hidden...hidden and open cards. Open cards are, as I mentioned, open cards are the play...cards that a player keeps face up so other players can see it. And hidden cards are facedown and only the player knows its identity. The comprehensive algorithm, which ...which is what...usually what happens in a real game of poker where players have to try and calculate the probability of them winning against another person or them winning against their opponent, based on their current hand. And in a comprehensive algorithm it's hard to do, since all the cards are hidden and you don't know which which card which player has. And the open cards make AKQJ game easier calculation wise. And the number of hidden cards increases with the number of betting rounds. so the first case we did was with one round and six players, which had six hidden cards. Then we have one betting round and five players, which had seven hidden cards and so on. So earlier, we...or in the model, there were six players given labels A through F. We assign them probability characteristics, which are the percentages of confidence they have to make a bet. A's is 0%, B's is 15%, C's is 30%, D's is 45%, E's is 60% and F is 75%. And F's 75% probability means that unless they are...unless they are 75% sure...at least 75% sure that they will win against that person...their opponent, then they will not take the bet, so it means they're very, very conservative with their betting. a general poker case, which is the comprehensive algorithm, and the worst case algorithm. The general method is calculated or, for example, if we're trying to calculate the probability of A winning the poker match, in terms of the general method, we would have to use the probability of A is the probability of A versus A winning versus B times the probability of A winning versus C, all the way to probability of A winning versus F. This takes a very long to calculate and it's cumbersome in a real poker match, since the betting round time can be 45 seconds to a minute and not many people can do this kind of calculation in a minute. So the worst case...so that's where we developed the worst case method. The worst case...we calculated the worst case outcome by seeing which player can make the best hand with the cards they can see, out of four shared cards which are which are open to all players and one hidden card and one open card per player. We use these two algorithms in three different cases. The first case is with one betting round and six players. We have to determine in which cases each player will fold or stay and how many chips they will win or lose. For example, A stays, even if they lose chips, because that was ...that was one of our models, which we...which we knew had a problem gambling, so...that...so according to our condition, they had to stay. B wins against E, but not against anyone else. C does not win in this case. D wins against B and C and ties with A and E. And E doesn't win, and F wins all...against all the other players. Note that because C and E did not win, they...or because B didn't win enough because C did not win at all, and because E did not win at all, they all fold in the next betting round and they lose their...they lose their chips. And so the ones that stay, they're the ones that like...considering this is a one match or one betting round poker match, the ones that are...the one that is most likely to win is F, since they have the highest worst case winning probability. And we can see that, if you go to the previous slide, we can see that player A or player ...or for the six players' case, player F's overall was very close to 80% so we can... so the...so there is a correlation between...or there was a strong correlation between the AKQJ worst case method and the general method. And the next case is with one betting round and five players. In this case, the confidence values change. A's is 0%, B's is 12% , C's is 25%, D's is 38%, and F's is 65%. Player E was removed, since they lost the most chips and had to fold in the previous round. In this round with the fewer players, we see that there are more hidden cards. The number of hidden cards increases with the number of betting rounds and players. With the number of hidden cards increasing, the calculation time may take longer, and this may make players more nervous and unwilling to do those calculations, since they could lose money. In this case, player F didn't win, as shown by how their worst case winning probability is less than their confidence percentage, so they are forced to fold. Meaning that...or this could be because conservative players may not do well in the later stage of the game, because they are, you know, too stingy with their money. They do not make the right bets even when the stakes are higher. The next case is with four players and we did this test to confirm whether player F wins or not, and we can...and if it if F does not win, we can say with confidence that more conservative players do not well...do not do well in the later stages of the game. And note that F has to keep decreasing their winning probability. We also tested whether the worst case algorithm matches for five players. In the general method, B has an 11% chance of winning, D has a 46% chance of winning, and F has a 48% of...48% chance of winning. This is very close to the worst case values, and so we get a strong correlation of...we get a strong correlation with an R squared of 0.998. And so, this worst case makes F win 50% of the time in the five player match. And we also tested this for four players, in which... in which we confirm that F will win 50% of the time in the four players case. And then the third case is with three betting rounds at six players. In this case the values are the same. and E's, which we added back is 56%. In the first round F wins, however, as players started folding like how B, C and D fold, F has to change their confidence level. F has to change their confidence level to match the winning probabilities for a round. F's level changes to 60% after the first round and 54% after the second round. This is involved, like these are models. This will be involuntary, indicating that there's nervousness in a conservative play style, which contributes to such players losing in later rounds. Players A and F represent the extreme playing styles, which may be indicative of problem gambling. And this is a quick summary of the betting round calculations. In two betting rounds or in a game with two bending rounds, we will see that F only wins two times out of 20. So the the... player's possible hand doesn't all...like F's possible hand is not good enough to match up against the opponent's possible hand. This happens in both two betting rounds and three betting rounds. And this is this is due to the nervousness, and this is due to the fact that F's probability was way too high, like they could not match their confidence level, so if you want to...like perhaps the optimal strategy for doing well in a poker game is to be not too, like, aggressive with your betting and also not too conservative with your betting. Be like player D, who has... player D had a 38% chance...or 38% probability so they would have to be at least 38% sure that they would make...that they would win against their opponent. So like...I think...or based on this, around that spot is a good place to be for poker. Now, why is this important? We also did...or we also did the three players test to confirm that player F has to fold and player D wins in this round. So we can say that player D has arguably the best strategy in this poker model, in the AKQJ poker model, with more than one betting round and more than...and less than six players. We also did the two betting rounds, to consider...or to show that F doesn't win either. And this is another case which we did to test if the outcome of F losing held throughout the betting rounds. We did this, three betting rounds and four players. And now, why is this important? We've showed previously that players can perform simple calculations, like the worst case, to control their urge to bet even on the losing hand. People with a gambling addiction may be very, very aggressive with their betting and even though they know they're losing their money, they will still bet in the off chance, in the slight hope, that they will...that they will make a big win. People with a gambling addiction may be very, very aggressive with their betting and even though they know they're losing their money, they will still bet in the off chance, in the slight hope, that they will...that they will make a big win. This is an emotional style of betting and it falls right into the trap of gamblers fallacy, which is thinking that either if you're on a lucky or if you're on an unlucky streak, you will get lucky the next time, for example. These new cases of fewer players and more betting rounds in a poker match introduced the ideas of nervousness in players, even the most rational ones. So F, in the one betting round in six player calculation, was a player who had...like you could see that player had experience or he...or that that player was able to wait, he was able to have a good strategy, he or she, excuse me. This is...and so we can apply this to a real poker game, because in humans, the parasympathetic nervous system releases stressful hormones, like adrenaline, into the body, eliciting a fight or flight response. When you're in a high stakes situation like a poker game, then you know that...you know at a superficial level what's at stake, you know that your money and your assets are at stake, so you will bet on that to try and increase it. That's just inherent human nature. More bandwidth during this poker game is given to the amygdala, which is an area of the brain that controls emotion, rather than the rational and involved in executive functioning prefrontal cortex. So the more hidden cards in a round, players may be more nervous about their bets and make worse bets, even if they... even if they're very experienced in poker. And this nervousness may be correlated with the blindly betting nature of compulsive gambling disorder, based on the concepts of risk calculation and gambling for thrill. And our conclusions were that F, or overly conservative players like player F, may not do as well in realistic poker situations. So the ones that do the best are are on the conservative side but are more willing to bet than your regular very, very stingy player. And get our main takeaway is that gambling disorder may be mitigated if players can understand basic statistical calculations and use them in their games. And the future research, which is actually not in the future, it's going to happen or we've done that research already is, we are going to get more reliable data using Python and that's what Arhan's presentation is going to show. Thank you for watching. Arhan Surapaneni Millions of people visit the capital of casinos, Las Vegas, every year to party, enjoy luxuries, but most importantly, gamble. Gambling, though forming a false reward of success in winnings still hides the dark pitfall of financial and social struggles. Our generation is modernizing with technological advancements rapidly becoming the norm and computer programming languages being the needed subjects to thrive in a modern world. Utilizing modern computer programming tools and ???, we are able to take a deeper look into their this psychological problem and analyze tools to help solve them. This is done with authors Siddhant Karmali, Mason Chen and Saloni Patel, with the esteemed help with advisors, Dr Charles ??? and ???. As previously mentioned millions of people attend casinos per year affecting a large population. With a problem that affects such a large sample size, one of the overarching questions concerns how do we express solutions to the problem of gambling and how do we do this efficiently. Through Siddhant's presentation, we have learned much about original methods that help explain gambling, while also presenting the economic misnomers about gambling, proving that being calm and cautious provides the best results, rather than relying on luck of the game for positive results. All of which was done through Java. This method, however, takes 30 hours just for 92 runs. This is not effective and unusable. Using Python however reduces this time to seconds, while also allowing for higher levels of complexity that provides to be beneficial to the overall methods. To recap, the method required a six player model, each of whom receives two cards in the 16 pack of AKQJ. One of the cards is hidden, with four cards placed on the table. Each player has confidence levels determining how often they fold or continue playing. Continuing playing loses three points and folding loses only one point. This is done in an effort to model players that range from blindly gambling to cautious strategists. Usually when conducting a large scale experiment is very tedious and time consuming to run tests at an adequate amount so that the data that is produced is usable. An efficient alternative is using computer programming, more specifically, using the computer language Python in a Monte Carlo simulation, rather defined as a broad class of computational algorithms that rely on repeated random sampling containing numerical results. The Java program applied here in the diagram is very ineffective. They only allow for random two samples and the process of individually sourcing out each specific sequence is difficult to do. When we are trying to derive larger sets of data, it is important to change this. Analyzing general differences between languages, it is important to identify the Java is staticly typed in a compiled language, while Python is dynamically typed in interpreted language. This single difference makes Java faster at runtime and easier to debug. But Python is easier to use and easier to read. Python is an also an interpreter language with an elegant syntax makes it...making it a very good option for scripting and rapid application development in many areas. This is applied in the following method. One thing we need to look at is the new randomizer, which is applicable, with the full 52 cards set, the AKQJ 16 card model allows for more accuracy in statistical ??? rendering, especially when using a Monte Carlo simulation. One beneficial aspects that we can add with the Python our characters. As Siddhant covered, we can now use wagers. Instead of manually applying different confidence levels on our two card ??? randomized, we can use Monte Carlo to make different choices based off of a certain percent threshold on the ability to add or remove wagers which will see you later on. First, it is important to talk about the deck. Here we see a 16 card array with our shuffling function. This allows the cards to be randomized, similar to the initial function seen in the Java program. This draws two cards from the 16 card total for different players. With this in mind, there are extensive applications to both the original comprehensive method and our worst case method, which will be covered and developed on later. ??? the same deck allows us to add changes or move things to affect how we compute the worst case method which is helpful in our end goal. We don't have the same flexibility in our old JAVA method. Python specific changes for the worst case method include specific elif/if statements, so that the player with the worst card is indicated as loss and the formula covered by Siddanth has changed. This is important because this allows more efficiently... more efficiency in data collection, with the probability on making the randomizer's outputs more accurate. You can add specific names to these separate cards, which is also another helpful application in itself. This simplified Monte Carlo simulations allow for more complexity, as it allows us to add our new wagers based off scenario which involves multiple betting rounds that Siddanth has described. We can change characters by changing the current wager, which affects how the player stays or leaves the betting round, this being a key difference from the Java method. To concept in this program has to do with setting a variable that will eventually return value to the funds and the wager to the initial wager argument. Sending the current wager to zero, we set up a while loop to run the condition continuously until the current wager is equal to count. Then for each set where we get a successful outcome, we increase the value by our wager. If we were to add a command telling us or the character to slow down at a certain wager, then we have a simple way of having threshold to betting. We can also edit the same form to accept specific sequences, like the full house, only allowing a wager when the sequence is present. These thresholds create our upper mentioned risk management level. We can plot the probabilities in ???, where we append to each updated current wager to an array of X values and each updated value to an array of Y values, proceeding to plot both PLT(?) to plot X values and y values. The key component of the better function is the condition if...in the if statement that corresponds to a successful outcome. This can be adapted to any outcome needed, including general scenario and worst case scenario. When we apply our comparisons with the two character and multiple character, we can add important statements that make sure the data is compared properly. Using Python you can make sure drawings like two pairs or instead of higher values ruling out players with a lower value, forcing them out of the game. This allows for more efficiency with the new Python program rather than using the original Java program. Something that would take 30 hours originally can now be done in a matter of seconds. After this program is applied, we are able to run a correlation test with the new results against the original Java results. If we look at the red lines for both the general and worst case methods, we see that they're extremely close to one, indicating strong correlations. This is also paired with the higher R squared value. We also run the one proportion hypothesis test, telling us for both methods we failed to reject the null hypothesis. This value, although high, isn't close to...isn't as close to one. For something that we would expect to be almost identical, because this is computer programming. There are two main reasons for this, the first is sample size. 92 seems like a lot, but it isn't strong...isn't as strong a trend as one would expect. To fix this we are able to increase the sample size to 1,000 or even 10,000. One more reason could be the application of Python. As previously mentioned Python is more comprehensive language, rather than a static language. This could change the effect, but mainly stay the same. Here we see the program finally applied with their results presented. With this program we see what each play...which cards each player draws with the cards... which card is shown and hidden, what cards are on the table, how many chips players gain and lose and, finally, who wins, how they win. In the diagram below, we see Player 1 wins with the full house. the ability to run multiple characters into one computer and one function, describing the different sequences while applying different numbers of players, also showing probability of different outcomes, even adding the multiple betting runs in a regular game of poker, which Siddanth has covered. This is vital to further developments, because it allows for people around the world to use this program and method to develop on ideas as a learning tool able to be utilized to act as therapy to gambling addicts. One more development exploits the neural network (AI) aspect, making this program more detailed by adding features like bluffing, added enough for it to make the program more like actual game seen in casinos. Finally, we were able to see by using Python, not only do we experience higher accuracy, but higher...much higher efficiency, ultimately, proving our original hypothesis that being cautious gives more money and more wins, turning a program that originally took hours in just a few seconds. So next time you go to Vegas or even partake in a game with friends, remember this project and remember that being more careful and taking a bit more time with decisions will help you in the long run. Thank you.

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

Chi-hong Ho, Student, STEAMS Training Group Mason Chen, Student, Stanford OHS In late February 2020, the COVID-19 pandemic began gaining momentum, and the stock market subsequently crashed. Many factors may have contributed to the fall of stock prices, but the authors believed the pandemic may have been the main cause. The authors’ objectives for this project were to learn about stock investments, earn money in the stock market, and find a model to help determine the timing and amount of the trading. All of the data was Z-standardized to help eliminate bias and for ease of comparison. Specifically, the authors used three Z-standardization values: Z (within stock), Z (NASDAQ Ratio) and Z (Group NASDAQ Ratio). The authors compared the current stock price with the previous stock price, NASDAQ stock price and group NASDAQ price average respectively. After that, the authors combined three Z-values and applied it to the stock index, which much decreases the data bias and is better for reducing risks in investment. Determining the outliers is to acknowledge the timing of investment. Quantile Range Outlier is easily affected by the skew factor, because it is usually used in the normal distribution. Robust Fit Outliers is a better tool, because it can eliminate skew factors. The authors established a model to help people invest the right amount of money. Auto-generated transcript... Speaker Transcript Chi-hong Ho Hey, no. Come on. Okay, hello everyone I'm Chi-hong Ho, junior at Henry M Gunn High School. My partner is Mason Chen; he is a sophomore at Stanford Online High School. Our project is to finding the multivariate statistical modeling of stock investment during COVID-19 outbreaks. In late February due to the coronavirus pandemic spread out in the world, the stock markets start crashing. There are several factors causing this year's stock market crash. There are several factors causing this year's stock market crash, such as a COVID-19 pandemic. OPEC/Russia/USA oil price war. Also 2.2 trillion bailout package from US government. And some companies laid off their employees. There is more than 30% unemployment rate due to the pandemic. Let's compare with the 1929 Great Depression unemployment rate. At that time it was about 25%. Also in November, the US during the presidential election. And then the manufacturing supply chain shut down because of the coronavirus pandemic spread out in China before. Yeah, the COVID-19 pandemic influenced the stock market as the past stock market crash that happened in 1929 Great Depression, 1987 Black Monday. Because the COVID-19 pandemic had a huge impact on the world causing lots of deaths, the stock market should get more influenced by pandemic. Compared to US history, the Great Depression of 1929 and Black Monday 1987 both crashes continued for a long period. COVID-19 wasn't. This year, stock market decreased by 25% from the peak in March to April. Before March, the COVID-19 pandemic in the US was not spread out as fast as other countries. After the COVID spread was global, lots of countries were locked down. National and global-wide lockdown situation would affect the stock market. Look at the graph in the left corner. That is the situation that happened Korea. Asian countries got COVID pandemic before America. We use the Asian countries' situation to predict what will happen to the US. Based on this graph with this color that is COVID-19 inflection point is for Phase II, it may impact the stock curve significantly, because it seemed that the case growing speed will be little bit decreased in this short period. Okay. In the left side of graph, it's the correlation map of the case versus the date. This is its correlation map in the US. The cases were added in early February, and much through in late March and early April. The right graph is the stock market down by the date, which means that the case interest to maps are related to each other. When the case grew much, the stock market start to crash and the lowest point was more than 35% down. Compared with China, South Korea and the US, we knew that the Asian countries got COVID pandemic before the US. We could conclude that what will happen in the next few months in the US. Based on just that specific table and data below, I found that the duration of Phase III is really short. And it's now safe for us and we can go back to work. Compare with the direction of Phase IV, which is double the time of Phase III, so in this time we are we are feel more safer to go back to work. After we look at the Asian countries pandemic phase, we can predict the Phase III and Phase IV duration of the US. We would estimate the US end of Phase III should be on April 15 and Phase IV should be on May 25 for the best case. The worst case is end of Phase III is around April 30 and the end of Phase IV on June 10. Our recent projects to define stock investment strategy, our objectives are learning and experiencing the stock investment, earning money in the stock market, and building a model for judging the times to trade or exchange stock. Firstly we own eight high technology stocks which were purchasing 2008 and 2009, and average gain is about 400% in March. Some of the stocks are the top 20 of the Standard and Poor 500 stocks, with an average gain of more than 800%. We want to find a time to sell those stocks and get money back. Because of the COVID-19 situation and the stock market crashed, the stock price was not as high as in March. So we wanted to sell quickly. After selling high tech stocks, we look at 23 COVID impact stocks, which stocks lost ground because of the pandemic. We want to wait for those stocks surge after few months. Our choice of COVID impacted stocks should have a minimum of 3 billion market cap. When we down to trading stock, we could have some exchanges. We should...we sold one tumbling stock and bought one rising stock for balance. We need to make sure our stocks will surge in the few months. We separate the stock transaction decision chart into three levels. The first level is to decide what we will buy, what we will sell and to exchange. In level two, we are picking the stocks from selling group, buying group, and exchange pair from the two groups. The third level is the tool we will use. We may use the Z index, we will use the outlier detection tools. The function of the standardization is to give us the idea of how far from the mean to the real data points. Why do we need to use it? Because we need to convert the actual data to an index that's easier for us to compare. Also using the standardization to eliminate the bias of raw data. On the left side, there are blue boxes, purple boxes and red boxes. The real data is in the input we collect, which is in the blue box. The new index after the Z standardization is found in the red box. That is our output. The Z standardization is a tool we use, which is in the purple box. In the blue box the cubicle represent NASDAQ stocks, which is popular and lots of people will invest in. High tech stocks had grown a large amount during five years. The range of Z standardization is from -3 standard deviation up to +3 standard deviation. After the Z standardized, we can get Z within stock, Z NASDAQ ratio, and Z group NASDAQ ratio. The Z within stock is to compare the stock price with the past five years' previous stock price. The, the Z NASDAQ ratio compared the stock price with the NASDAQ stock price, the Z group NASDAQ ratio compares the stock price with the group NASDAQ mean. We use the Z standardization to help us look up the ris. In the end we will combine all three Z score into a new stock index. The stock index can help us to lower the risk of transactions. Here's the data table after we standardized with the raw data. We can use the stock...the stock price index change. USA stock has been in downward trending since a peak around in mid February. Some stocks are more robust and certain ones are impacted by COVID-19. We establish this modeling algorithm on March 7-8 and and the database on March 14-15. The red in that index shown in this figure speaker, that are good time...present a good time to sell out those stocks and we can earn more money than other not red index. Also, there are some index marked by color blue. That is the time to we can consider to buy the 23 COVID impacted stock because we can lower the cost and we can gain more in the future. The reason why we use the outlier algorithm is that the outlier is helping us determine the timing of trading stocks. The best way to determine the outliers is to use quantile range outliers. Firstly, we used to find the entire quartile range which is Quarter 3 minus Quarter 1. The algorithm is outlier equal to Q1 minus or Q3 plus x IQR, x equal to 1.5 for regular or 3 for extreme. Why do we need to choose the extreme outlier? Because regular outlier cannot shows the longer timing that we wanted. Extreme values are found using a multiplier of the entire quartile range, the distance between two specified quantities. So extreme outlier can have a long detecting level that we can use in the investment, which can help us reduce our risk. But technically the quantile range outliers algorithm is used in the normal distribution situation. The stock market is not as much as the normal distribution, so the outliers will be influenced by the skew factor. Thus we need to use more powerful tools that are not influenced by the skew factor. Because of the stock performance, we would not care more about the tails than the center of the distribution. The next tool we use is robust fit outliers. We use robust fit outliers to ignore the skew factor. Outliers and distribution skewness are very much related. If you have many so called outlies in one in one tail of distribution, then you'll have skewness in the tail. In quantile range outlier detection, the assumption is normal distribution. So skewness in the distribution will introduce an inaccuracy in the outlier detection methodology. If the distribution is significantly skewed, like it probably is in stock market data, the robust fit outlier are a better method to find the outliers accurately because they tend to ignore the skew factor. The robust fit outlier estimates a center and spread. Outliers are defined as those values that are K times robust spread from the robust center. The robust fit outliers provide several options for coupling the robust estimate and multiply K, as well as provide tools to manage the outlier found. We use K=2.7 for regular and 4.7 for extreme outliers. After we use the regular robust fit outliers, we can find out the outlier in the selling index data. Look at the right graph. There are so many shaded red cells in F5 and F8 columns, indicating that we can consider to sell those stocks to maximize our profit, because the stock price is selling above average. Each column is showing some stock index change by day going down to the column. The reason why we use the extreme outlier for buying index is that the buying index is dropping. That means it is really difficult to detect the outliers of the buy index. Not like the selling. Selling index is rising, which is easier to for us to determine the outlier. On this page, there are some color blocks in the data table, like B6, B13, B15 and B19, which indicates that we can consider to buy the stocks. Lots of people make money by investing in stocks and most people may choose the right stock to invest in for reasonable ROI. But investors are challenge to find the right amount of money to invest. Also other human psychological factors will favors our certain investment. We can determine the amount of stock we buy, sell or exchange based on this model, which can minimize a personal investment bias and reduce the overall financial risk. The model provides two ways to judge the amount of investment. The first one is the color block analysis. Now in that analysis, the blocks with dark green are good to sell and the blocks with orange or red is good to buy or it's good to exchange. And then in the bottom, there are the transaction levels we define. The L10 is the least investment amount and L1 is the greatest investment amount. If we want to sell the stocks, we will sell not too high, is in the L5 amount. And do the exchange, we also choose the L5 model. And if we do the buying, we can just to consider to buy more, so we just buy L2 amount. So based on this model, we can just manage our ...based on this model, in the investment you will reduce your financial risk. Then the function....okay...this is...in the Phase III is in the exchanging part. Also we are using Z standardization for...to convert the data point into this index, but this is the exchange index. We set up the exchange threshold Z exchange index should be greater than 15. This is an average index we calculate, which can tell investor the time. On the left side, there's the line chart, which shows the change of each exchange pair. Based on this line chart, we can see the trend of S5-B1 is about 15.8 and S5-B14 is about 15.16. So that means we can consider to doing that exchange between the S5-B1 or S5-B14. After Z standardized, we can get the stock...sell stocks index and the exchange index. The selling index is to compare the stock price with past five years stock price. The Z NASDAQ ratio is compared to stock index with the average stock price. So the exchange index is compared to stock selling index, which was the stock buying index. We use Z standardization to help us look after risk. We consider about 184 choices and we need to make sure our investments will be in the right timing and pick up the right pair to do the exchange. We're also using the quantile range outlier algorithm to help us determine the timing. The small value of Q provides a more radical set of outliers than the large value. Look at right side table. We use quantile range outlier methods and get a top thee outlier whose exchange index value is greater than 19. This is the second time we consider the exchange index. We found out that there are top index, which are the signal and help...and it's the best timing for us to do the exchange. Like S5-B14 pair at 19.27; S5-B13 pair at 19.12; S5-B12 at 19.07. On the left side, we have the timing prediction model. This model is presented with a color code, color box style. That blocks with dark green are good to sell for us, and dark orange or red is good for buying or exchange. The best time will be bold and shown in the graph. April 6, which is the best day to do the exchange, since we come to exchange data from February in 2015. We consider the exchange pair twice, which can double insurance that we can make more money in the stock market and enormously reduce the investment risk. On April 7, the exchange index had little change compared with April 6. On that day, S5-B1 pair had a 19.18 exchange index. In the right graph, that was the exchange stock information. On April 8, we sold the KLAC stock at $154.32. The market stock price was $148.85 so we saved about 3.5% into selling Also on the same day, we bought the Delta stock at $22.42. The stock market price was $22.92, so we gained about 2.2%, so it's changed in the sale. So sellign the same amount of stock and buying the same amount of stocks for balance. The model of selling and buying stock was equal to 65 quantities. After one day, the exchange pair has helped us gain 5.7%. Oh yeah, all stock buyers will focus on their stock trends. My partner Mason and I monitor the NASDAQ stock daily range outliers from 2020 late February to 2020 mid March. We separate the daily trade window into a certain time slots, 30 minutes each, and we want to find when's the best time of trading. There were 24 peak and valley points we detected and the upper threshold is 2.7%. In the right corner figure, we can see the stock price in the open, close, high, low times and also we will calculate the average...we count the price range and rank when we do the stock price peak and valley detection. ??? considered the discrete number of sample size. Among 24 peak/valley points we detected, the data shows that 17 out of 24 points is about 70%-71% were happening in the first or last hour. We set the one-proportion test that we made the null hypothesis is that we wake up early and have a stock lunch session to do trading. Null hypothesis is assuming the uniform distribution probability. Look at the left corner table that we can see the null proportion value is 0.34 which is greater than 0.05 so we cannot to to reject the null hypothesis. So To be four slots among 13 slots available, so we can not reject the null hypothesis. Look at the right corner table or figure which shows the distribution level of peak time and the valley time. In our research we provide a new model to pick the right stocks and the ??? the amount of buying and selling, also the exchanging index. Timing is a really important factor in the investment. This model of stock investment is accurate most of time during the COVID-19 pandemic. Our research group invested in the stock market and gained 2.5% after we finished the project. We may use it to predict in the future if the pandemic doesn't end. Based on our research early bird or last minute favor stock trading and can earn more money. Thank you.

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

Anne-Catherine Portmann, USP Development Scientist, Lonza Sven Steinbusch, Senior Project & Team Leader, Microbial Development Service USP, Lonza Often, the analysis of big data is considered to be essential in the fourth big industrial revolution – the “Data-Based Industrial Revolution” or “Industry 4.0.” However, handling the challenge of unstructured data or a less than in-depth investigation of data prevent using the full potential of the existing knowledge. In this presentation we offer a structured data handling approach based on the “tidy data principle,” which allowed us to efficiently study the data from more than 80 production batches. The results of different statistical analyses (e.g. predictor screening, machine learning or PLS) were used in combination with existing process knowledge to improve the overall product yield. With the newly created knowledge, we were able to identify certain process steps that have a significant impact on the product yield. Additionally, several models demonstrated that the overall product yield can be improved up to 26 percent by the adaptation of different process parameters. Auto-generated transcript... Speaker Transcript Anne-Catherine Portmann Hello, today I will present you the power behind data. This presentation is based on the idea that a principal this presentation will allow us to efficiently study the data of more than 80 production batches. We were able to improve the product yield of more than 26% based on the process knowledge and the statistical analysis. The statistical analysis allows also to identify the key process step which have an impact on the product yield. So I will first introduce Lonza Pharma Biotech and then we will go to the historical data analysis. Lonza Pharma Biotech was found firstly in 1897 and shortly thereafter, it was transformed to chemical manufacturer. Today we are one of the world's leading supplier to pharmaceutical, healthcare and life sciences industry. Here at Visp, we are one of the biggest site from Lonza and we are most significant for R&D, development and manufacturing. We are, we have a new part of the company, the Ibex solutions, where we are able to complete biopharmaceutical cycles from preclinical to commercial stage, from drug substances to drug product, all of this in one location. You probably heard about this lately in the Moderna vaccine against the COVID-19, but it's not the only product that we are producing here in this. We are also producing small molecules, mammalian and microbial biopharmaceuticals, high potent APIs, peptides and bioconjugates including antibody-drug conjugates. Now that you know a little bit more about Lonza, I will go to the historical data analysis. So, first of all, I will present to you this process on which the 80 batches are run. So, first the upstream part. So the upstream part have first the fermentation part, where the product is generated by the micro organism. So the product make a microorganism. The... product to produce...the product, the microorganisms is produced (???) from the DNA during fermentation. Then we have the cell lysis where we disturb the cell membrane and allowed to release the product and all what is in the cell. and have access to this product. Then become the separation. In the separation part, we remove the cell fragments, such as the cell membrane or the DNA. Then we come to the downstream part, which is based on three different chromatography and allow the purification of the product. So the product is here in yellow in the below part of the slide. And we can see that during each of the chromatography part, we are able to profile a little bit more of the product. At the end, we perform a sterile filtration of the product. So the goal was to increase the overall project yield, and to do that, we first collect the data of the 80 batches and order them in a way that we can analyze them. Then we perform yield analysis. And then we discuss the result with the process analysis. With the SMEs, so the subject matter experts. Then we have seen and...we went to the data analysis for the upstream part and we perform this for analysis on the left of the slide. Then we go for the downstream part and focus on the Chromatography 1. At the end, we make a conclusion from all what we see in the...in the analysis and what the subject matter expert orders. And at the end, we recalculate the yield. Let's see what...how we organize our data. So we based the data on the tidy data principles. That is a big part of the...before the analysis, which takes time, but it's really important to have clean data and making an efficient analysis afterwards. So first we have about, that is, the observational unit one, for example, we can say to the fermentation. And then on each row of the...of the file we include one batch each time. For each column, we take a parameter. For example, for fermentation, the pH, the temperature, all the titer (that means the amount of the product at the end). And then, for each values, here corresponds the correct value from the column and the batch. And with this one, we can go to JMP and perform the analysis. So let's see how we calculate the yield. So, first we calculate the yield for each step, beginning at 100% for fermentation and see how it decreases along the process. So what we observe is that we have a big variation at the fermentation step. And then we have a decrease in the in the product amount at the separation step, as well as the chromatography 1 step. And so we go with this data to the subject matter experts and they told us that the complex media variability impacts the final titer of fermentation, so we have to explore this spot. Then, for the separation, the strategy that was choose could have a different impact on the mass ratios. And for the chromatography 1, the pooling strategy have most probably an impact. So, then we will see what the data said. So we look at the upstream part and perform different analysis. So the first analysis was the multivariate analysis of each of these USP process stages. So we focus on the fermentation, cell lysis, and separation. And see all the parameters, how they could correlate with the product at the end. So here, what we see the fermentation, the amount added to Reactor 1 had a medium correlation with a good significance probability. For separation, the final mass ratio. The mass ratio at the intermediate separation have both a major impact on a significant probability. You see that other parameters, such as the initial pH from Reactor 2 is very close to the medium correlation threshold and have a significant probability. And we will see if this parameter in the next analysis. We have also selected here only the parameter, which is scientifically meaningful for the other analysis also. Then we went to the partial least squares. For the partial least squares, we see that we have for fermentation a positive correlation for all these parameters. So again we see the amount of Reator 2, the initial pH of Reactor 2 and the initial amount of Reactor 2. As well as a new parameter, that is the hold time. And we see that the amount of Reactor 2 have a positive correlation with this analysis, but the negative with MVA. And this could be explained, because of the 80 batches whereby just...which were running production, but they were not designed to answer a question of positive or negative correlation on the product...on the final product. So that could be done in the future, in another analysis with a proper design. With ??? to still say that we have an impact on the final product. For the other parameters, at the other steps of the upstream part, we also see that the prediction matches the multivariate results. And we have also a possibility to improve the titer. Here we see with the prediction profiler that we can also optimize in the future and the project yield. Then we test the product...the predictor screening. And here we ran 10 times the predictor screening and the five parameters which will always found in the top 10 were selected. And here what we see. It's the initial pH of Reactor 2, the mass ratio at the end of separation, the mass ratio at intermediate separation, the initial amount of Reactor 2, the amount of Reactor 2, and the amount of Reactor 1. So again, we have the same parameter that appears to have an impact. So, then, we went to the machine learning. This machine learning analysis, XGBoost, is a decision tree machine learning algorithm. And to avoid having in this result parameter that's not part of the... of the top of our parameter, we include a fake parameter, which give us a kind of threshold in the parameter importance. And all the parameters which appear above this threshold were considered to have an impact. The other are considered to be random and below this random parameter and have no impact or not significant impact on the on the final product. And here we can see that the negative correlation will appear for Reactor 1. The pH of Reactor 2 and initial amount of Reactor 2, we have a positive correlation. And for the mass ratio, we have a negative correlation. Again, as I explained before, the difference between negative and positive correlation was not the goal and not designed for this experiment, so we know it's an impact, but we don't know if it's positive or negative yet. Then we will go to the downstream part, specifically on the Chromatography 1. And here we use the neural predictive modeling. So in the normal predictive modeling we use the a different fraction of the chromatography. So on the graph on the right, we see that Fraction 8 have...is the main fraction, so where we found most of the product, the highest purity. And then by decreasing from Fractions 7-1, we have product, but also more impurity. and Until now, we were taking into account of the Fraction 4 in our analysis and we would like to see if we can include also Fractions 3, 2 and 1. And what we saw is that the by increasing the number of fraction, we increase the yield but decrease very few the purity. So the graph on the left, we see that if we go to the Fraction 2, we decrease the purity to less than 1% but the yield was increased to 5%. And then, when we include the Fraction 1, there we have a bigger decrease of the purity but it's a little bit than 1%. More than one percent of purity decrease and the yield was in the other side increase of about 10%. So then, with this result we try to summarize everything together. So far, fermentation, we have a final volume of the tanks and reactor, which were identified by most of the methods. The initial pH of the fermenter was also identified by the different analysis methods and the complex compound variability by the process experts. And to be able to see the effect of the complex compound variability, we will need further investigation in the lab. Then for separation, we have the mass ratio, which was identified by some methods, analysis of the data but also with the process...by the process experts. And the strategy is very interesting. The process expert decide to look at it and try to make some tests to be able to improve the yield in the production. For the Chromatography 1, the pooling strategy was identified by the process expert on the neural network analysis. And here the method can be easily implemented in the lab and also in the production. And the yield is really increasing a lot with with this method. So, then, we recalculate with the prediction... prediction profiler how we can increase our yield of the different steps. And for fermentation we were able to increase the yield up to 16%. For the separation, we are able to increase it up to 5%, and for Chromatography 1, here we've raised up to 5%. In the other slide we wrote up to 10%, so we try to be the worst case scenario to say up to 5%. And, at the end, we will have a total increase at the end of 26%, so that is a good way to improve our process and to focus exactly on the part where we can have a big impact. And just based on the data without doing a lot of experiment in the in the lab, that is also cheaper to do these analysis with JMP as doing a lot of experiments in the lab. So we have a lot of of gains at the end. So, thank you very much to all of you for listening me today, and also a big thank you to my colleague, Ludovic, Helge, Lea, Nichola and Sven with the ??? of this presentation.

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

Georg Raming, Senior Manager, Siltronic AG In the course of daily work, users often need to analyze the same or similar data from distributed sources. Because these users are rarely involved in defining the IT infrastructure, it is often the case that the needed data is located across a variety of different platforms (databases, fileservers, etc.). Users are then forced to spend a lot of time querying and combining the data to get it in a form appropriate for analysis. JMP offers several possibilities to query data from different sources and to connect them afterwards. In this presentation, some examples of workflows are shown that can be used to efficiently get data into table(s) and effectively meet the requirements of analytical users. Methods used to accomplish these tasks include Query Builder, SQL, JSL, JMP add-ins and Virtual Join. Auto-generated transcript... Speaker Transcript Georg Raming (Siltronic AG) Hello everybody. Today I want to talk about strategies and examples for data acquisition from distributed and complex sources. My name is Georg Raming, my job is process development of ??? grown single crystals at Siltronic. I have some experience with statistical evaluation and process and product data and statistical education. I'm responsible also for JMP software at Siltronic and training activities for a few hundred users. Siltronic is one of the world's leading manufacturers of highly specialized hyper pure silicon wafers. Some technical hints. So I'm working here with data tables instead of database in this presentation, but the concepts shown here are originally used for getting data from relational databases via ODBC connection. And the JMP Query Builder is working in a similar way for JMP data tables and for database tables. What is the target of today's presentation? It is to establish some ways for getting data from database in an easy and efficient way into your JMP or into a JMP table. Let's first talk about the building blocks. So one may be a JMP data table. So if you have generated a data table from a database query, it may look like this. So I have taken here the famous sample table from JMP, Big Class, and I have queried the data via JMP Query Builder. I deleted all the script that comes with the table. And you will find for this data table if you query it by JMP Query Builder, several scripts and table variable. So inside that variable, you can see the SQL. So the definition of the data table is drawn from the table or database. And you have these scripts, a source script that simply gets another copy of the original data table like this. And here you can see the original scripts are also in there, but there is again these scripts from the JMP Query Builder. There is also a modify query script that lets you edit this query on the JMP data table and also run it. And there is a query for update from database. So if you change the data in your data table or there is new data in the data table, you can simply update the data by pressing this button. Like this. Here also the scripts have come again from the sample table. Okay, so the next building block is JMP Query Builder. So once you have one or several tables at database, you want to make a query on like here, I took Big Class and Big Class Families from JMP sample table directory and use the JMP Query Builder like here on the tables. JMP Query Builder. JMP sees these open tables and there is also one in the primary field, and I can add a secondary table like this, and JMP automatically then creates a join between both tables. So how they are joined. And you can edit this join like this. You'll see the details here. And then you can go on building the query like this. You have both tables here available. Here you also can edit again the join And maybe you want to add all columns, like this. And maybe you want to put a filter, like this. And run the query. Then you again get this result table with the scripts from the sample tables we do not need here. I delete them. And these are the scripts, the scripts and variable written by JMP Query Builder like I used it before. And again, here you can edit this query. The script for the query is saved in this result table here, like we have defined it here before. So this is how the JMP Query Builder works. And as mentioned before, it works the same way on database tables as it works here on the JMP sample tables. The next building...I also put here some scripts and you always can switch from visual Query Builder to a custom SQL query. So this works here. Another red triangle menu. Convert to custom SQL, and what you have defined here visually, you can define here in a text SQL query. You can run it. And when looking here at this query or pressing Modify Query button, you will see the text query. So this is only one way, converting from visual query to custom SQL query. So let me just tidy some tables. Okay. Next building block is to get data from database and you need an ODBC connection to that data sources. And usually the ODBC data sources and drivers are set up by your IT administration. So under Windows, you can find these connections under the ODBC data source administrator. And please keep in mind that the bit-ness of the database sources should fit to JMP. So in this case I have 64-bit JMP application and therefore I need 64-bit drivers And the user sources, you can set up yourself. And system DSNs are set up by the administrator, and also drivers for different types of databases are set by the administrator. In case of troubleshooting, you may use this tab for tracing where the query went and what may be the problem. And a tip from my side would be to check whether the connection works properly from these data sources. So if you press Configure and the data source administrator, you can hear for a certain source test the connection after you have provided your credentials. And if there is everything okay, so the system tells you that the connection succeeded. It's not up to Windows that the query doesn't work. And also other details, you can find in the JMP manual in the documentation. For example Using JMP, chapter three, you can find it here in the Help menu and the JMP documentation library. And there it is well-documented how to use all these tools. Okay, so there are then in scripting and JSL, in the scripting language, three ways to script these SQL database queries. The first one is a new SQL query, as we have already used before. So it's located here in the JMP menu under Table, JMP Query Builder for JMP tables. Or for the database, you can find it here under database Query Builder. And then New SQL Query is the most powerful command of all. So, it generates a new query object, you can save itself as JMP query file. It can generate a data table directly and provides very well origin of the data, as we have seen before with source modify and update script. Another command in JSL to query data from database is the open database command. You will find documentation of all these commands also in the scripting index. Like here, if I type open database, you can find here how it works for all commands. And a third way to connect to database and put some queries is database connection. And it needs three steps. The first one is create a database connection. The second is put your queries, one or several, to the database. And to finalize, you need to close the database connection. And finally, if you wrote some nice scripts. You should make it accessible to you, or maybe to colleagues, and there are at least two ways. So one is to put it in an add-in so that you can provide it to others easily also. And the other possibility is to put it in a custom menu, like I did here. And you can find it in the menu, then here like this. There are two add-ins installed in my system. And here is my personal custom menu. So let's...we have finished the building blocks. Let's go to the examples. And the first example is to use a table script to save table layout and the query. So it may look like this. If you have a data table that comes from database and you put some scripts in it for nice graph also, you may want to use it every day. And there is a nice possibility here to say, copy table script without data. And this script you can then use. I would like to make a new script. Sorry. And paste it into the script window. And here comes that new table by simply pressing the Run Script button. And it comes as ???, without data. And you can save this script, for example, in your menu and to get the data, it may be a huge amount of data. You can simply press the Update from Database button and the table gets filled with all the data from maybe database or somewhere else. So this is a nice possibility to simply use the script to get the data from database. The second example is to use a query from inside a table. So like this. If you have a data table and you want to query some additional data, dependent on the content of your current table. So I will show it. I delete some rows. And with these three names, I want to go to a different table to fetch some data. And it is done by this script. And here you can see from this table, I got the names and made a query to query data from another table from Big Class Families. And how does it work? You can find it in this script. It is quite short. Here, the names are taken from the first table query as defined, like here. The names are substituted into the query. And here the query is put to the table Big Class Families. Okay, I need to close this table too. This one is a custom query script with graphical user interface. So if you have, for example, a large data table on the database and often need small amounts of it, you can, of course, pull all the data, but maybe it's more flexible and efficient to get well-filtered data. So it works like this. one table for filtering the data and one table with all the columns of the big data table. And here we can filter graphically. Let's say female and age, so I took only two columns to filter, like this, 12 and 13 years old. And I may want to have all columns or remove just one column. I query it and here get the proper result with these restrictions. And you can, as you can see here, and the source script or in the SQL, the restrictions are here from the graphical user interface defined. And how does The scripting work works? It is a little more complex, of course. So there is the filter query. The columns query. Here is a script, a function that converts the data filter conditions into an SQL condition. This is the GUI part, graphical user interface. And here, finally, the graphical user interface results are evaluated and put into this custom script. The next example is a two step query. So let's assume you have to query first some batch IDs. And to take these IDs into another query, maybe to another data table to get additional data, you can do it like this. So here I took the Big Class data table as filter. So I filtered only the male and equal or less than 13 year old pupil. And took these names to query on Big Class Families, all rows, like you can see here. And both tables are connected via a virtual join, like you can see here. So this table refers via the name to the other table to the filter table. And I can use the columns of the filter table here in this table as if they were in this table. So this is a nice way to have several tables and use the content in one table. And the last example is how to run two or several queries in background in parallel. So I have no example here. It is discussed in the JMP Community. And if you're interested, you can have a look at. So now I'm finished with my examples and the journal and presentation material scripts are available online. Thanks to the Community for the wonderful discussions, and thanks to the JMP developers for building and maintaining this great piece of software. And finally, thank you for your attention.

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

André Caron Zanezi, Six Sigma Black Belt, WEG Electric Equipment Danilo da Silva Toniato, Quality Engineer, WEG Electric Equipment Quality assurance and customer needs are rigorous terms that frequently refer to reliability. Improving products in terms of reliability challenges engineers in multiple ways, including understanding cause and effect relationships, and developing tests that reproduce customer conditions and properly generate reliable data without exceeding the product launch time deadline. Combining engineering expertise, historical data and lab resources, a design of experiment (DOE) was performed to quantify the product lifetime based on process, product and critical application variables. Performing several analyses using JMP tools, from the DOE platform to the Reliability and Survival modules, the team was able to describe the product lifetime as a function of its critical factors. As a result, an accelerated life test was established which is able to simulate years of product usage in just a few weeks, providing solid evidence of some specific failure modes. Standardizing its methods and procedures, the test became a crucial requirement to verify and validate new technologies implemented at WEG motors, optimizing the development process and reducing time to market. This poster provides information about how we used JMP to analyze data and develop an accelerated life test. The project followed the step-by-step approach: Project charter: understanding the primary and secondary objectives, the multidisciplinary team was formed to share information and knowledge about, customer historical data, lab resources, motor reliability, cause and effect relationships, environmental application conditions and reliability data analysis. Historical data analysis: knowing and quantifying risks about analysing historical data, the team fitted some life distributions to understand Cycles to Failure (CTF) scale and shape parameters. Mainly, shape parameter refers to the failure mode to be reproduced in the lab tests, according to the Bathtub curve. DOE planning and analysis: in order to reproduce failures, the understanding about motors reliability was endorsed by cause and effect relationships, provided by a Fault Tree Analysis (FTA). The FTA was a source of critical variables combined into a Designed Experiment (DOE) to quantify how to accelerate product cycles to failure. Conclusions: as a result, DOE provided a surface profiler with the indication of the best condition to accelerate products life time. The accelerated test also provided a shape parameter, that when compared with historical data shows an overlap, meaning that the same historical field failures were reproduced under controlled conditions. Implementation: with an accelerated test, development and innovative process will became faster while providing important information about product reliability. Auto-generated transcript... Speaker Transcript André Zanezi So, hello, everyone. My name is Andre Zanezi. And as I am a Six Sigma Black Belt at Weg. And I'm here today in Discovery Summit to talk about the development of an accelerated lifetime test to demonstrate and quantify washing machine motors reliability. We know that some...every company when they are developing new technologies, new solutions, they often face some challenges when they have to improve their reliability in their product. We face the same problem, the same challenges. And the project should analyze and understand our historical reliability data to quantify our historical reliability data. And try to develop a procedure, an accelerated life test in our internal labs, labs to reproduce our failure...our viewed failure modes. Basically doing it, we could develop... develop models in the first two way. So at the first step, we get some historical data from our motors. And using reliability and survival modules in JMP, we fit some life distribution for our motors. And we know that doing in fitting some life distributions as Weibull distributions, we can understand our motors reliability, our motors lifetime. And in JMP, we also can use lifetime distributions and fit different distributions for different failure modes. And we did it for four main different failure modes. And we compared it, we analyzed it and understand or understood our ...our motors lifetime, our motors reliability. And doing it, we were capable to understand and to quantify our scale and shape parameters and it basically doing it, how much cycles was necessary to have a failure. And also, according to the ??? to know which kind of failure modes we are facing. And we have...we we did also cross check with our internal...our validation KPIs, basically plotting survival. We have survival plots and cross checking with internal KPI's to understand if the probability and the failure range was correctly with...if our data was reliable. And understanding all these failures modes, we could we could develop an internal test to accelerate on...accelerated our internal ...our internal test. And basically to do it. We should understand the physics, the environment... that environment conditions that will our motors are working in. We did it through fault tree analysis, basically deploying and understanding the the cause and effect relationships. Doing it, we could set the most critical variables in in this cause and effect relationships. And again, use JMP to do to design an experiment in order to try to quantify the effect of some variables in our response in our cycles to fail. Basically, we were trying to reproduce field failures in our labs. And we did it, we will run several tests, and as a result of our experiments, we could have...we could set and fit some using fit model...fits on models to our data and understand the relation of our environment and motors variables to cycles to failure, and understand to the survival plot, and sort through the surface plot, understand the relation of some variables with cycles to failure, and set specific point to accelerate our, our motors' lifetime. And again, running some some batch of samples, we could set and fit lifetime distributions for our internal results, for our internal tests in accelerated life test. A we were seeing some failures but at the end of this experiment, these accelerated tests, we we should ensure that we are facing and we are causing the same failures as we had in historical data. So we come back to the lifetime distributions in survival and reliability in survival module. And again, fit some Weibull distributions now for our internal...internal results for our accelerated lifetime tests. And we noted that basically they shape parameter, the parameter of that, according to the best to means the filler mode and we cross this information with our historical data and we can see that crossing both informations, we have an overlap between the the shape parameter of internal test and the shape parameter of historical data. And it means basically that we are having...we are not reproducing the same failure modes in our accelerated life test. And basically it means that now we can develop products in a faster way because every time when we have a new technology and new design, we can put it on this accelerated life test and quantify if we are improving our motors reliability. And we can make it faster than before, and develop faster...develop products in in the faster way. We also did some technical cross checks to to prove that we are facing in reproducing the same failures to to implement in this this test in into the development process so that that this was how we used JMP to provide a lot of information and put it on our internal test. It was made by really good teams. Please feel free to make some contact and send email if you have any questions. And that's the end.

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

Aaron Andersen, JMP Principal Software Developer, SAS This is an introduction to JMP Projects, with a focus on new project features introduced in JMP 16. JMP Projects provide a single-window, tabbed interface to JMP data, platforms and results. Projects can be used to quickly save and reopen a set of related files and reports, or to organize windows belonging to separate analyses. Starting in JMP 16, data tables, scripts and other files can be embedded in the project file, creating a self-contained project that can easily be shared, archived and distributed. Auto-generated transcript... Speaker Transcript Aaron Andersen, JMP So, welcome to organize and share your work with self contained projects in JMP 16. My name is Aaron Andersen and I am the software developer at JMP responsible for the projects feature. Normally I would say I'm coming to you live from SAS World Headquarters, but this time, neither of those things are true. First, because they asked us to record these presentations in advance, they didn't tell me why but I suspect they have doubts about my ability to be coherent and instructive at three o'clock in the morning, they might be right. And second, because none of us are actually at work yet we're still all working from home. So I'm coming to you this morning pre recorded from the corner of my bedroom closet in Apex, North Carolina. We do the best we can. Nevertheless, I hope to teach you a few things about projects that can help you use JMP better and more efficiently, so let's jump right in. it's a way of viewing a set of JMP content in a single organized window, and it's a way of saving a set of JMP content into a single organized file. The best way to understand that is to see it in action. Let's get started and build an example project together. To create a new JMP project, launch JMP 16 and go to file, new, project or Control-Shift P on Windows and something similar on Mac. What I get is a new empty project window. Now we just need some data. The data I want to use today is called dinosauriformesUSA.csv. And dinosauriformes are a glade of reptiles that include dinosaurs, their close relatives and all their descendants. I got this data from a group called the Paleontology Database. They have information on, I think, every archaeological site in the history of the world or paleontology site. With information on who and where how what and so on. It's all creative commons licensed and you can download any subset of it you want in any format you want, so it's fun data to play with. And this is something I made, just the dinosauriformes in the United States of America. What I want to do is import this data into my project. So first I'm going to drag it over into the project contents. I'll explain what does in a bit here. Just for now, I add that to the project, I can then right click and use JMP's import wizard to import the data. My data...here's some header information and then there's the actual data. My data starts on line 19. I know that because I rehearsed this presentation, not because I can count that incredibly fast. Next, the call numbers are right and import. And there we have 9,531 rows of dinosauriforme data, and lots and lots of useful columns, most of which don't mean much to me. I'm not a paleontologist, but I did read the information that came with a table, and I know some of these are. So try to get a feel for what's in this data. I think I'll first look at just the count by state, so I can kind of see where in the country all of these fossils were found. To do that I'm going to use JMP's distribution platform. I'm going to run distribution by state. Filter's helpful here...state...okay. And that opens in a tab in the project. Now I have a data table, and I have this distribution report, all of them live inside the project window. I can use this workspace tool pane to toggle between them, or I click on the tabs up top. So we're starting to see the value of projects here. I can order this data by count. Now I can start to see where this data is. At the bottom here, we have the states in the United States where the most paleontology digs have been located. Wyoming, Montana, and New Mexico. Those all sound like dinosaury places, this archtypical hot dry barren plains from, you know, the opening scene in Jurassic Park. I am surprised by how few there are in the Dakotas though. What really surprises me is Florida. I don't think of Florida as being a particularly dinosaury place. It's very wet and it's very low lying, I think that most of the state is like 10 meters above sea level or something. So that surprises me. I'd like to know more. To get a little bit better picture of where things are, I can create a map of where each of the finds have been and maybe what kind of fossils were found there. To do that I'm going to launch Graph Builder. It shows up in a separate tab. And I am going to graph latitude by longitude. Again these filters, new in JMP 16, are super useful. Graph latitude by longitude, just showing the points. I want to put a background map, so I can see the state boundaries. JMP comes with several of these pre included, including this one. There we go. Maybe zoom in on this a little bit with the magnifier tool. And then I'd like to color by class, get a broad generalization of what sorts of things were found were. Now done. Maybe zoom in a little bit more. Focus. And I can already see the answer to the Florida question I had earlier, because all of the dinosaurs in Florida, they're all aves, or birds. Whereas your... whereas your saurishcia and ornithischia (those words are...I still struggle to pronounce them after all this time) there's more found in the center of the country where we think more of the typical dinosaur find places. So that's... that's already pretty useful. But to really appreciate JMP's data discovering abilities, we have to remember that everything is linked. So it'd be helpful if we could see both the data, the distribution and the map, all the same time and interact with them together. Projects makes this really easy. I just take the distribution, grab it, drag it over to the side...there we are. Close that. Size this slightly. Then I'm going to take my map, I'm going to drag it to the top, just like that, and now, I can see the map, the data table, and the distribution all together at once. And if I want to know about a certain... certain state's dinosaurs, for example, I can click on it in here and follow it in there. Let's try that with, say, Utah. Now Utah is a state with a lot of dinosaur finds. It also contains something called Dinosaur National Monument, which I have fond memories of going to as a kid, where they have this neat little visitors center that's actually built into the side of the rockface where the fossils were found. So you can go in and see the dinosaur fossils in their natural habitat, as it were. So I can look at this and I can start to see the kinds of fossils that are found, and certainly clustered, in the southern areas of the state where it's hot and dry. Not match up in the north, where all the ski slopes are. But I also see something else that troubles me a little bit. This point here is coded as being in Utah, but it doesn't really look like it's in Utah at all. It's clear out in Arizona so something's wrong. Maybe the data is coded wrong that PBDB, maybe I misunderstood exactly what that state column means or maybe something else. I can click on the row here and then down in the data table, it will be selected if I go to rows, next selected, scroll straight to them. I can look at this a little bit. It's sort of different of...there's the same guy, found both of the other...doesn't tell me anything. Latitude and longitude. County, though, is the same as up here, so it may be that the state and county are right, but the actual latitude and longitude are wrong. I'm not sure. That warrants some further investigation, but for now, I'm just going to hide and exclude those rows from my analysis since they seem to be problematic, and I can come back later and investigate what's wrong with them. That's looking pretty good. file, save project, it's going to ask you for a file name. I'm going to say dinosaurs, but I can't really spell that, so just change to dinosjmpprj, save and that's it. You're done. Saving a project in JMP 16 is a one-click operation. The data table, the reports, and the layout of them on the screen have all been saved into this project file, which I just saved. And now I can leave this for as long as I want and five seconds or five months later, I can come back to it, reload this, and everything comes back right where I left it off. It's one of the very powerful features of projects, it's the ability to save your work whenever you want and come back to it whenever you want. So, how does that work? Well since project goes out and saves everything in the appropriate place, depending on the type of thing that it is. These reports, it saves that as JSL, puts that in the project file. The data tables, it depends on whether the table has been saved yet and, if so, where. Right now, this dinosauriformesUSA data table isn't saved anywhere. We imported it, and then we just left it as a new unsaved table. When we save a project containing a new unsaved file, JMP saves automatically to a secret location in the project file and then, when we reopen the project, it stores it back to the new unsaved state. Returns to the same same state it was in when we saved the project. However, since this is a very important data table in this project, we probably do want to actually save it at some point, if for no other reason than that would allow us to restore to a saved state if we make a change to the data that we later regret doing. To save a data table in the project, you select data table, and you can use the menu here, or you can use file save. And now we get a dialogue that's new in JMP 16 because the projec...because JMP is going to ask us where we want to save the table to. This default location, project contents, is a reference to the files and folders that are saved inside the project file itself. Anything we save here will be embedded in the project file when the project is saved. Which is what I'd like to do with the dinosaur table, but I just use the default location, file name is fine, hit save and it shows up here in the project contents. If I save the project again, everything is written to disk in that way. This project contents path, it shows me an always available view of everything that I have saved inside the project. I can create folders in here, like original data, and I might move my csv file into that folder. There it is. Now it's gone. So I can use this area, just like I would use a location folder on on my disk somewhere, but everything I put here becomes part of the project. Project, close it out. Now I have this file with that saved file in it. Of course there's just this one data table at the moment, but maybe for larger projects, I'll often want to have more than one table and that's perfectly fine in JMP. Maybe I want to look at dinosaurs in North Carolina where I live. I could look into whether any of those are still able to be visited. We could take my kids there. Find North Carolina. There we are, the North Carolina dinosaurs. I could go down to the data table and do table, subset by selected, call this dinosauriformesNC. Okay. There it is. Maybe I tab this with my original data table. And there I have it. I've got two data tables, one that has been saved to the project contents and one that hasn't been saved at all. I have a map in Graph Builder and a distribution. Save all of that together. Close it out. This is where the sharing part comes in and the title of this presentation said organize and share. And the reason why are we talking about sharing so much is that sharing projects is super easy, now that we have self contained projects. This one file contains everything that I need, and therefore, everything anyone else needs to open and view this project. So if I, say, wanted to share it with a coworker, I could just maybe attach it to an email. And that's it. That's everything that John needs to open my dinosaur project in this single attached file. I can also use this...the self-contained projects to upload to the parties of the JMP user community and share my work with the larger JMP community. Or if I encounter a bug, I could send this project to JMP tech support and ask them to look into what went wrong and why. Another thing that I like to do is make backup copies of the project, especially if I'm working with complicated scripts. And I want it to return to a known state, I can take dinos and do dinos version one or whatever, and now this has got all the data tables inside of it, so if I open this one, and I make a change the data table, I still have my backup copy back there that I can pull back if I need to. I can even write an automated script that took this every day timestamp that and stuck on some archive somewhere. The other thing projects can do, of course, is open things that aren't saved in the project. You don't have to make a standalone project if you don't want to. I can open a file on disk, like maybe I want to include this visitor center picture in the project. I can just open that. There it is, part of my project, opened to the tab. If I save the project now, the image is just going to be referenced. I can see that it has a path and projects is just going to reference it by path. So this file here now requires this file here in order to work properly. It's no longer self contained. But there are some advantages to this if, maybe I want to include a file that is really, really big. I don't want to have a copy in the project. If the file is created by some external system and automatically updated every night at midnight and I want always have the latest version. Or if I want multiple projects to share the same data table or same script, such that if any one of them makes a change to it, the others all get the updated version. I can have them all point to the same file somewhere on my computer or a network drive somewhere. It's really up to you how you work with projects. You can find the pattern and, kind of, the way that makes the most sense for how you use JMP and the thing that solve the problems that you run into. I think we're very close to out of time, but I will show off just a few more things quickly that you can look into on the JMP website or other resources for more information. One is the project bookmarks. It's another useful tool pane over here. And this works just like bookmarks would in a web browser or other system in that I can make links to files that maybe aren't always open. I can bookmark a file on disk or I can bookmark a file from the project contents and that that will just be there as a quick link way of opening that file in the future if I want to have access to it, but not always have it open. Also, a project log pane, which I can use to see log messages generated by the project contents of...that is, by the all the things open in the project. This is especially useful for scripters and activities that generate lots of log messages that I care about. And lastly, in JMP 16 we have a setup JMP preferences to allow you to customize how you use projects and how projects appear to you. These first two could allow you to automatically create projects when JMP opens or automatically create a project around a file that you might open that is in a project. Here you can decide what you...what you want the default project to look like, what tool panes you want to appear. And this one, I'm quite proud of. The project template allows you to create a project, save it to disk, and tell JMP that every time I do file new project, I want a copy of that project to show up as my new blank project. So if you have certain files that you or your organization uses all the time and you'd like every project you start to have that file open or maybe just that file bookmarked so you can get access to it quickly, you can create a project template, point JMP to it, and then every new project that you start will contain those files or those bookmarks or maybe just the layout of the windows and panes that you prefer. That is about all I've got. This is normally where I'd pause for questions, but since this is pre recorded, I'll just issue this invitation instead. If you have any questions, comments, feature suggestions or bug reports on projects, please feel free to reach out to me directly. Again, my name is Aaron Andersen. During or after the Conference or, and especially in the case of bug reports, talk to my friendly colleagues at JMP tech support. Lastly, I will note that the project we just built here, dinosauriformes.jmpprj, is included in JMP 16 as a sample project. So if you'd like to continue looking into this, figure out what's wrong with those Utah but not Utah rows, or if you'll visit the United States in the near future and you'd like to see if there are any dinosaurs that have been found in the area where you will be, you can find this project in the JMP 16 sample projects folder. Thank you for coming. I hope you have a good time at the rest of the conference. Best wishes and good night.

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

Brian Corcoran, JMP Director of Research and Development, SAS JMP Live is a powerful new collaboration tool. But it is only as useful as the quality of the content that you provide to it. This talk discusses the development of a JMP JSL script to acquire data through the internet via a REST API. It then will show how to publish an initial report to JMP Live and automatically update the data within that same report on a daily basis. In this fashion you can provide automated reporting to your viewers who just want to see the latest data when they start work in the morning. Auto-generated transcript... Speaker Transcript Brian Corcoran Welcome to the morning update. This is my talk for JMP Discovery Europe 2021. My name is Brian Corcoran and I am a JMP development manager. So what are we hoping to do today? I would like to show you how to create a report in JMP based on an internet based data provider using a REST protocol. Once we do that I'm going to introduce you to how we could publish this report to JMP Live using the updated JMP scripting engine that we've put into JMP 16. Finally, I'm going to show you how you can automate this test, so that every day, when you come into work, reports have already been updated for you and you can just view it with your morning coffee or tea. Okay, so first let's talk about internet data providers. Most of them are based on something called a REST protocol. It's a stateless call, essentially it looks like a URL with some parameters tagged on to the end, and an increasing number of organizations are using it to expose their public data to end users. Some examples are the World Bank, US Census, Google. So JMP has a facility to help you with this called HTTP Request. It will allow you to access these services. Typically they use something called a GET or POST verb to get to these. HTTP Requests will allow you to use those. For this particular report I'm going to use the Johns Hopkins COVID REST API with some data from the pandemic. So Johns Hopkins is the university United States that aggregates this data from all over the world, and then provides this free public API to access. Now there is also a premium version of this, and that gives you better access and more granularity with the data, but we're going to try to get by with the free version, for now, and hopefully you can take some of the scripts that I give you and try them out yourself. So what does the REST API look like? Well, let's take a look. Here's an example from Johns Hopkins. It starts out with this base URL, which in this case is API, the COVID19API.com. And here, you can kind of look at the URL and say, hey, we're asking for the total confirmed cases for a country. Where you see this bracketed country, you have to actually insert the name. In my case, I'm going to use Germany (they use the English names), but there are probably 100 countries where you could try this out. At Johns Hopkins requires you to supply a starting and ending date after this base URL where you see the question mark. And those are essentially parameters to this API call you're making and that will allow us to return values within the date range that we specify. And it's kind of this long format. It's the month...the year, month and day, T for time and then the 24 hour time with a Z appended to it. So fortunately JMP has facilities to help you with that. You can use a format call, and today we'll give you the exact time for right now, along with the date, and we specify the format string that we want to use, and then we can just append a Z to the end to get the format we need for Johns Hopkins. So, like I mentioned, REST calls typically use either a GET or POST post verb. Johns Hopkins uses a GET; I'm going to jump out of PowerPoint for a minute to show that. If you go to that website that I had in there, in the slide or also in the paper, you'll see that it provides the APIs by type and showing you essentially how you pass the information and what you expect to get back here. Alright, so you can kind of go through here, see what each one requires, and you can see, there are premium categories, where you have to pay so much per month to access that. Okay. So. What do you get back? Well, you get back JSON, which is just a bunch of strings in name value pairs (for instance cases, colon, and then a numeric string) that's going to represent the number of cases for this particular observation, alright. And JMP has nice facilities to access JSON and we'll show you that in a minute. Here is our actual HTTP Request call. We're just going to pass in our URL, the method we want, which is GET and then this secure zero. Why do we do that? Well, Johns Hopkins is a public API and it does not want to use secure socket layer, or ssl, so we need to turn that off or the call will fail. And then we just make our call with the send command and our JSON will be returned in the data. So let's drill down a little bit into a script. I'm going to get out of PowerPoint and we're going to bring up JMP. I'm using JMP Pro 16, but this will work with regular JMP as well. And I'll mention that this script is included with the conference materials, along with the paper that we'll be looking at. Okay, so. What am I doing here to set up? First of all, I'm going to say I'm using my...the documents folder for where I'm going to store my data, and I'm going to store it in a table named covid19_de.jmp. Later on I'm going to generate a report and I'm going to make sure it always has this name. It's very important, in this case, that it's a standardized name and I'll show you why later. Finally, I'm just converting my document's path in my name and my file to a full path to use. All right, we're not going to worry about this date formatting function here, and we're going to go into the meat of how we acquire our data. Right, so we're going to use a pattern here, and that is, we're going to assume that we've never run the script before. Now, if we do not find a file where we have previously accumulated data, then we will create a data table and fill it in with values. However, if we already find a data table, then we will just update the table with what the latest days worth of data, just one value, all right. And that way we don't have to worry about whether we've run this before or not. We can just run this script kind of blindly, you can give it to somebody else. It'll work for them. Alright, so here we're going to say if our file exists, our data table in the documents folder, just go ahead and open it and, by the way, go ahead and set this flag to say we have data already. Otherwise I'm going to create a data table and I'm going to create it with a date, column, cases, and daily change. Right. Now the next part is we're going to format our strings for the call to Johns Hopkins. Remember we needed to have a from and to range. Alright, so here's the string we already looked at. This is for the today value. Alright, so in the case where I only need the value from... the current day's value, I'm just updating my data. I'm going to go from yesterday to today essentially. Alright, so I'm going to create yesterday by saying today minus two days and I'm going to go to today. I do two days, because depending on the time and when the data gets updated at Johns Hopkins, sometimes you get one value, sometimes you get two. When I...if I get two, I will just take the most recent value, but I want to make sure I get something. Alright, the next thing is if I've never gotten data, I want to have a start date. Now I'm arbitrarily going to start on September 1 of 2020; you could put whatever you wanted. Now here you see I'm actually using August 31, that's because Johns Hopkins does not actually give us a value for the change between days, so they will only give you the total cumulative cases for pandemic data. So in order to calculate the change, I have to subtract today's value from yesterday's value. Well, if I want to start in September 1 then I need the August 31 data in order to compute the change for September 1, so that's why we do that. If you pay for the premium API at Johns Hopkins, you can get the change value. Alright, so here's our URL that we discussed earlier. Alright, so this is important here. If we have data that our URL is just going to start from yesterday, but we don't, and we're using this if statement, if we don't, then we're going to start from September 1, our start date. And then we're just going to go to today. I show this URL just for debugging purposes, but then this is where we actually do our request and send call. Right. I put in a little wait to make sure that it has a chance to run. And here, is where we get our JSON data back, and this is where JMP has a really handy facility for this. Parse JSON will take this big block of strings and break it into an array of name value pairs. You can then call in items on that array to find out how many pieces of data you've gotten, and you can reference that data as an array with array subscripts. Okay. So now let's navigate down a little bit. Here we're going to fill in our data table. If we already have data, then we're just going to add one row to the table at the end, and we're going to fill in that data value, along with our change, which we compute from today's value versus yesterday's value, all right. Then we just save it off. Now. If we've never created it before, then we're going to add the number of rows we have, minus one because there's a header information and then we're going to cycle through this and calculate all our daily changes, date values and put the case data in the table. And I think I'm going to demonstrate, hopefully, running this from scratch right now, we're at like 163 days or something like that. And then we will save out that table to the documents folder. Okay, now that we have the data, we can think about publishing to JMP Live. But this is probably a good chance for me to describe the...how you do JSL programing JMP Live and how we've changed it in JMP Live 16 and JMP 16. Let me bring up the paper associated with this talk and you'll have this in the Community as well. Alright, so in JMP 16, we rewrote the scripting to be, hopefully, more powerful but easier to use, and the scripting revolves around the idea of having a connection or managed connection information stored away. So. What happens is that...let's bring up JMP again. I'll show you what that means. You go to file, publish, manage connections. All right. And we can add one. Here, you would specify a connection name of your choice, and then the URL, which JMP Live is essentially a REST service itself, where your JMP Live site is. If your administrator requires in a secret API key to enable scripting, you would need to supply it here. When you do this and you hit the next button, you're going to be prompted, most likely depending on your authentication mechanism for credentials. When you enter those, it will essentially give you an access token, which means that it stores it away on disk for you, not your credentials, but just this access token that allows you to access this site and script to it. That way, you can, without having to provide any of this information in the script, you can just reference this connection name that you supply. And I'll show you one, for instance, this is JMP Live Daily, which is what I'm going to use. Here's my URL on point, my API key. I can just reference JMP Live Daily, and it will know how to connect within my script. Okay. So, to create the connection then, I just say new JMP Live and the name of my connection. Now here I'm saying, let's prompt if we need to. What does that mean? Well if, for some reason your credentials, you know, expire or your access token is old, then it will prompt you to enter your credentials once the script starts. If you don't supply this and your credentials have expired, then the script will just fail. Okay. So how do we actually publish a report to JMP Live? Alright, so here's an example of just a simple bivariate that you might run out of Big Class, all right. So, to make that a published report, you're going to just say, create a new web report and assign it to a variable. And then I'm going to take this bivariate reference and I'm going to say, add that report to the web report. And I'm going to optionally provide a title and a description. And then I'm just going to call publish and the publish will return a result. Okay, and we might publish up to JMP Live and look like something like that. If we want, we, the result will tell us if we succeeded, and since we're actually the public...publication is actually like an HTTP call, we can look at the status, if we so desire, or an error message. Okay. All right. The other interesting thing is if the result of our operation is a...like adding a report or a folder to JMP Live that...the result we get back contains information that allows us to further manipulate that item. We can call this As Scriptable to use that information to create an object within scripting, like a report, that we can then access. For instance, after I've done this, I can use the report and set my report title. It will go up to JMP Live and change the report title. Okay. The whole idea within JMP Live now is around the idea of manipulating reports and folders, searching folders, searching for reports, things like that. The report understands that it has an identifier. And this identifier, if you were to look at it, is a long alphanumeric string that really would be awkward enter into a script or remember. But it's also...it's required for you to, like, uniquely identify that report. And the reason you might want to uniquely identify it, let's suppose you want to delete it. You tell JMP Live that you want to delete the report and then the report can...you can use the Get ID on that report object to to supply that unique ID so JMP Live knows which one to remove. I'm showing you these particular items because there'll be important in our ultimate script that we hope to produce. The other area where it can be really important to have...know that ID is in searching. Now here, I can find reports. For instance, let's suppose I want to find all the bivariate reports that start with Biv, I can just ask JMP Live to find reports and return a list of results. Then I can make...actually turn that list into a list of reports. There's a function called Get Number of Items on that report list that allows me to cycle through each one by subscript. And then, if I so desire to, like, I could delete all of them if I wanted to, alright. So the search capability is new and we hope fairly powerful for you to use to do large operations on a JMP Live site. Right so there's one operation that we need to address too, before we're really ready to show our script off a little bit further. And that is update data. So in JMP Live 16, we've added the capability to update just the data for a report without having to publish the entire report back up to JMP Live. Let's suppose you get a report just the way you want it, and you know, maybe it's a little customized and you like the appearance and you don't want to mess with it. But you do want to update the data and have it recalculated. Well, now you can pass just the data table for that report up to JMP Live and, also reduce, you know, the transmission time, and you do that by calling Update Data, providing the report ID, and then just the data table with the updated data. The report will recalculate on JMP Live, rather than having to do it on your desktop, and anybody who happens to be viewing that report will also see the update. Okay, so now we kind of have all the tools that we need to actually do our script, so let's go take a look. All right. So, here's where I create my JMP Live connection. And now I'm going to create a control chart. Now a control chart really is not the ideal analysis platform for this data, so why am I using it? Well, two reasons. One, it is nice to see day to day changes, and two, it allows me to plug or advertise the fact that we have another new feature in JMP Live 16, and that is to show control chart warnings. If you publish control charts and there are observations that are out of bounds that would generate a warning on the desktop, well, when you publish it to JMP Live or update the data, JMP Live can also generate warnings to send to anybody who's subscribed to that report and wants to get an email or a notification within the website that something is out of bounds. This can be really useful for things like process control. So my colleagues are doing a talk on control chart warnings and I encourage you to also check that out if you have a chance. How did I generate this? Well, I just went to JMP and, you know, with some older data and, for instance, you know there's a facility within JMP if you're doing an analysis, where you can just say save script to script window. I just took that information and plugged it into this script, so that's pretty handy. Okay. So let's get into the meat of how we're going to publish a report, and I promise you, we will run this in a little while. Okay, so. Once again I'm going to have a pattern here. I'm going to look for the report to see if it is already up on JMP Live. If it is not, I will publish it, but if it is already up there, then I will just take the updated data table that we created earlier and I will provide that to update the data on the server side and have it recalculate the report if it sees fit. All right, so how do I do that? First of all, I'm going to search for our report (and remember we use a standardized report name that I had specified earlier so we're always looking for the same one). And I'm also going to say only published by me, just in case somebody else had a report of the same name. I wouldn't want to get that. All right, I'll turn it into a list that I can look at. And if the number of reports is zero, that means I didn't find a previously published report. So I'm going to create a new web report, add my control chart builder output to it, and publish it. And I want to make sure that it's available for everybody by saying Public(1). If I did find one, I'm going to take that report, referencing the first item returned, and I'm just going to update the data here, using our updated tables that we generated previously. The rest of this is just debugging information that I showed in the log just to see if everything went alright, but it's not really necessary. Finally, at the end here, I have a Quit statement. When we actually go to automate this later, this is important because we want JMP to shut down and close down all the windows. Otherwise, the next time we go to run it, it might take a look and see JMP's already running and think that things are hung from a previous operation. However, for interactive operation, I'm going to comment this out right now. Okay. So I think we're ready to go here. We can give this a try and we'll hope for the best. Sometimes Johns Hopkins gets very busy, and will actually reject the request to get the data which would be unfortunate, but let's try this out. And just to show you, in the documents folder at this point, I do not have a JMP table with the name that I'm specifying, and if I go to the JMP Live site that I hope to publish to, we don't see any output from control chart builder there. Alright, so let's give this a try. Right, there's our control chart. I'm going to refresh JMP Live. Okay, there's our control chart builder output. This one did have warnings to it. If we look at this within JMP Live, we can hover over points and see what the most recent data is. This is for February 10; I'm on the 11th so we have up-to-date data. This is some data that is considered out of control, based on the moving average from back in January and late December. Right, so far, so good. If I open up my documents folder and refresh that, we can see that our JMP table has been created. All right. So, and we see now this has been published a minute ago. Alright, so let's go ahead and shut this down. And we're going to...actually, I didn't want to do that...hold on a second. We're going to cheat a little here. Let's go ahead and we're going to delete the last value that we got, right. Then we're going to save that and close it down, alright. So we're going to simulate the fact that we have not run it today yet, all right, and then we're going to run this again. Okay, just fetch the last value. And we go up to our website. And we see it just regenerated a few seconds ago again. So in this case, we just updated the data and we just got the last value. If I were to bring up my mail, I happen to be subscribed for warnings and hopefully, might see a little update here too. We're getting notifications that there is a publication in this control chart builder and there were warnings, and if I want, I can go and see where those failed and what points are out of bounds. Okay. Alright, so I think we're in good shape for trying the automated task. So I'm going to go ahead and I'm going to delete this post. Right. Let's shut this down. Shut this down. I am going to put our quit back in there, because now we're going to need that, for one way run in an automated fashion. And I will close this. Go to the documents folder and I'm going to delete our data and pretend that we're running this from scratch. Right. And let's make sure JMP is shut down. Okay. Now. If you've seen some of my previous Discovery talks, you may have seen me use the task scheduler before. It's a popular topic with me. You just type in task scheduler here on Windows; I hope you saw that. On the Mac, you would have to use automator or a chron job. I would suggest automator. All right, but the task scheduler allows you to run just about anything on a regular basis. So let's go ahead and create a new task. We'll just create task here and we'll name it COVID Data for Germany. I want to run with high privileges. I'm going to run only when the user is logged in because I don't want to enter credential data, but I would suggest selecting run when the user is logged on or not, if you're doing this for a production purpose, because if your machine gets rebooted due to a windows update or some other reason, you want it to still run and this will allow you to do that if you specify that option. It will require you to enter your credentials and when you finally save out this task. Alright. So for triggers, what is that? That's when I want it to run, so let's go ahead and do that. And let's say we want to run it daily starting tomorrow. And maybe I just want to run it at six o'clock a.m. before I get in in the morning, whatever get in means anymore. Before I roll out of bed and go to work. All right. So I'm going to stop this task if it runs longer than 30 minutes because that probably means it's hung. And otherwise I think we're good to go there. So what action do we want to perform? Well, we want to run JMP, so you have to navigate where jmp.exe is installed, which is in program files, SAS, and either JMP or JMP Pro 16. Go ahead and select that. And then, our argument is our JSL script which, unfortunately, you have to enter in manually here, which I'll do. But just make sure that you're careful with that. Okay. Now, under settings I'm going to make sure that we have always allow. It has to be run on demand, because that'll allow us to try it right now and make sure it works right. And if there's already one running, make sure to stop it. That probably means it's hung and stop the task if runs longer than an hour again, just in case it hangs. Alright, so there's our task to run every day. So we can debug it essentially by trying it out right now, since we allowed it to be run at any time. Let's go ahead, right click the mouse button and say run. Hopefully we'll see the taskbar. JMP will briefly come up, run and then go away. Looking down here, hopefully, things are happening. Okay, and then it's gone. Let's take a look at our website; it will refresh that. And there is our report generated a few seconds ago. If we look at our folder, we can see that our JMP table's been generated and hopefully tomorrow morning at 6 a.m., our task will run and get us a fresh batch of data and an updated report. And when we come in with our coffee or tea, we can take a look at that and make our decisions for the day. So that concludes my talk. I hope one of the three aspects we've discussed today, internet based data acquisition, JMP Live scripting, or automated task generation, has helped you with your job. Thank you for attending and I hope you enjoy the rest of the conference. Bye.

0 attendees

0

Event has ended

0 attendees

0

Wednesday, July 7, 2021

0 attendees

0

Event has ended