Abstracts

0 attendees

0

Friday, December 18, 2020

レベル：中級フィギュアスケートを中心にスポーツ傷害のリスクと予防について検討した結果を発表します。まず、JMPのクラスター分析、主成分分析を用いることで、傷害の諸情報を分析するための適切なアルゴリズムを選ぶのに役立つ様々なクラスタリング手法を理解しました。また、私たちはJMPのデータ可視化ツールを使用して、傷害のメカニズムを理解するための多くの相関関係と因果関係のパターンを生み出しました。これらをもとに、私たちは、JMPによる結果を用いてフィギュアスケート選手のための適切な傷害予防プログラムを開発しました。 Lily Sunは、スタンフォード・オンライン・ハイスクールの2年生で、フィギュアスケートで3度の全国大会出場経験があり、銅メダルを獲得しました。また、彼女は、JMPの統計分析を利用して、他のスケーターやアスリートのために予防的なトレーニング技術や怪我に関する多くの情報を提供したいと願っています。現在、彼女は、以下のような幅広い活動をしています。「GenShe」のデジタルマーケティングインターン「the Empathetic Leaders Movement」の、CMO兼リードインストラクター「She Helps Her」のメディア責任者「Women in Politics」の編集長

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：中級モデルに基づく多変量管理図(MDMVCC)プラットフォームでは主成分分析モデル（PCAモデル; Principal Component Analysis）やPLSモデルに基づいて、管理図を作成します。この管理図は多次元データでの故障検知や診断に用いることができます。ここでは、PLSモデルに基づくMDMVCCによるモニタリングをTennessee Eastmanプロセス（シミュレーションされた化学工業プロセス）を用いてデモンストレーションします。このシミュレーションでは、化学反応器がガス状反応物から液体製品を生成する際に、品質変数と工程変数が測定されます。まず、オフライン状態での故障診断をします。この場合、多変量管理図や単変量管理図そして工程の診断レポートを交互に参照することになりますが、MDMVCCプラットフォームでは非常に簡便に行うことができます。次に、JMPを外部データベースに接続し、MDMVCCプラットフォームによるオンラインのモニタリングをデモンストレーションします。製品の品質変数はすべての測定結果が出揃うまで、時間遅れが発生するため、故障検知も大体は遅延します。PLSモデルに基づくMDMVCCでは、品質変数のばらつきは工程変数の関数としてモニタリングされますが、一般に工程変数は比較的すぐに利用できるため、故障の早期検知に役立ちます。 Jeremy Ashはノースカロライナ州立大学でバイオインフォマティクスの学位を取得し、現在JMPのアナリティクスソフトウェアテスターとして業務に従事しています。学位論文ではケモインフォマティクス、計量化学、バイオインフォマティクスでの計算手法について執筆しました。また、ノースカロライナ州立大学で統計学の修士号を、テキサス大学オースティン校で生物学の理学士号を取得しています。

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：中級被験者に発生する有害事象の報告、追跡、分析は、臨床試験の安全性評価において重要です。多くの製薬会社と新薬申請を提出する先である規制当局は、この有害事象の評価を支援するJMP Clinicalを用いています。バイオメトリック分析プログラミングチームは、メディカルモニターやレビューアのために静的な表、リスティング、および図を作成する場合があります。このことは、特定事象の発生による医学的影響を理解しているドクターが有害事象の要約と直接対話ができないといった非効率につながります。しかし、有害事象の単純なカウントと頻度分布を作ることでさえ、必ずしも簡単であるとは限りません。このプレゼンテーションでは、JMP Clinicalの主要なレポートである有害事象のカウント、頻度、発生率、事象が発生するまでの時間の出力に焦点を当てます。JMP Clinicalの常識を超えたレポート機能により、JMPの計算式、データフィルタ、カスタムスクリプト化された列スイッチャー、仮想結合されたテーブルに大きく依存する複雑な計算を行っている場合でも、完全に動的な有害事象分析を簡単に行うことができます。 Kelci Miclausは、JMPライフサイエンスR&Dのマネージャーであり、JMP GenomicsとJMP Clinicalソフトウェアの統計機能を開発しています。彼女は、2006年にSASに入社し、ノースカロライナ州立大学で統計学の博士号を取得しています。

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：初級発表タイトル：空間データおよび形態解析ツールとしてのJMP―最短距離法クラスター分析を用いた上気道上皮内癌とその前駆病変のGradingの試み口腔，咽頭や喉頭など上気道粘膜の悪性腫瘍の多くは粘膜の表面を覆う重層扁平上皮という細胞の層に発生する扁平上皮癌である．さらにこの前段階ないし初期と考えられる病変が粘膜の白色ないし赤色の局面として見出されることが臨床的に知られており，患者から採られた組織の顕微鏡標本の観察によりそれぞれ上皮異形成epithelial dysplasia，上皮内癌carcinoma in situと名付けられている．さらに異形成は細胞形態の異常とそれらが上皮層内に占める比率に応じて変化の軽いものから軽度mild，中等度moderateおよび高度severeのグレードに三分されている．この鑑別は病理医の視察により直観的に行われ，ある程度の再現性を有しているものと考えられるが，客観的な検討は多くない．今回上皮層における細胞(核)の配置を定量化し，非腫瘍性(正常)，異形成，上皮内癌でどのような差異があるか検討を試みた．デジタル画像解析により顕微鏡写真上で細胞の核の重心座標を抽出し，JMP ver.15の最短距離法の階層的クラスター分析を用いて各重心をつなぐ最小木(Minimum spanning trees; MST)を生成させ，各枝の長さのヒストグラムを比較検討したところ，各群の間に差異が見出された．千場良司東北大学元講師(加齢医学研究所病態臓器構築研究分野)．医学博士．元文部省在外研究員(医学)(デンマーク王国オーフス大学)．人体病理学の領域において疾患の病理発生過程を幾何確率論や積分幾何学を応用した定量形態学，デジタル画像解析および多変量統計解析を用いて研究してきた．肝硬変，肺胞上皮，膵管上皮および子宮内膜に発生する早期癌とその前駆病変や癌の肝転移に関する研究論文がある. (https://pubmed.ncbi.nlm.nih.gov/7804428/ , https://pubmed.ncbi.nlm.nih.gov/7804429/, https://pubmed.ncbi.nlm.nih.gov/8402446/ , https://pubmed.ncbi.nlm.nih.gov/8135625/, https://pubmed.ncbi.nlm.nih.gov/7840839/ , https://pubmed.ncbi.nlm.nih.gov/10560494/) 癌の発生過程やその組織診断の観点からそれらの解析に応用可能な数理的手法に興味があり，クラスター分析や判別分析などの数値分類法に特に関心がある．統計解析のプラットフォームとしてはメインフレーム上のFortran統計サブルーチン，PC上のSPSSやSYSTATなどを経て優れたデータテーブルの機能と柔軟な分析環境に注目しバージョン8からのJMPユーザーである．　　千場叡公立はこだて未来大学システム情報科学部複雑系知能学科卒．在学中は物理化学反応における複雑系現象に興味を持ち，アミノ酸熱重合物のアルコール液相中におけるカプセル形成機構に関する実験と研究を行った．現在はデジタル画像解析，データサイエンスおよびニューラルネットワークを用いた形態および画像の認識や分類にも関心を持っている．

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：初級ドラえもんのひみつ道具は1,000個以上存在しており、主に特定個人の課題解決に使われる理想の科学技術である。しかし現在では、個人の課題よりも社会全体の課題解決が急務である。そこで本研究では、ドラえもんのひみつ道具を通じてSDGs時代の課題解決策を検討し、社会課題解決型イノベーションの創出を目指した。解析対象は「タケコプター」や「どこでもドア」など有名なアイテムに関するアンケートデータとし、主成分分析と選好回帰分析を行い、抽出された属性・水準に対してL8直交表を用いたコンジョイント分析を行った。その結果、人々の求める新しい道具の要素として得られた7項目：「知的能力の向上」、「道具の小ささ」、「運動機能の向上」、「時間移動なし」、「空間移動なし」、「時間コストあり」、「ファッション性のシンプルさ」を盛り込んだ理想のドラえもんの道具案を作成した。分析結果から我々が考案した新しい道具のアンケートより、「使ってみたい」「欲しい」は92％という高い支持が得られ、「視覚障害・聴覚障害・身体障害のある方のQOLを向上させることができると思いますか」への解答は「できる」が100%という結果が得られた。新井崇弘千葉大学卒業後、千葉大学医学部附属病院にて経営分析業務に従事。その後、慶應義塾大学大学院健康マネジメント研究科修士課程（Master of Science in Health Care Management）。現在、JMPを使用したデータ解析によるヘルスケア領域の研究を行っている。山口みなみ 2013　東京医科歯科大学医学部保健衛生学科看護学専攻　卒業 2013~2019　看護師として新生児看護に従事 2019~現在　慶應義塾大学大学院健康マネジメント研究科看護学専攻洪東方 2017 UNSW大学生命科学学部（病理学専攻）卒業 2018 シドニー大学　公衆衛生修士課程（MPH）卒業 2019～現在慶應義塾大学健康マネジメント研究科医療マネジメント学専攻

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：中級モダンアプローチと題しているように、従来の実験計画法の本と違って、ソフトウエア（JMP）の使用を前提とした本である。著者のB. JonesはSAS社の’Doctor DOE’、D. C. Montgomeryはアリゾナ州立大教授。従来の実験計画本では解析結果の主体は分散分析表であるが、この本はそのほかに、プロファイル・予測値と実測値のプロット・効果の要約・残差プロットなどのJMP解析結果が示されるので、読者は直感的・多面的に理解できる。本の内容は一般的な内容が一通り網羅されているが、ハイライトはスクリーニング計画（8章）である。実務家はどの実験計画を使うべきか、かなり踏み込んだ提案がされており有意義である。難解なレゾリューションの概念も、JMPの相関のカラーマップ・交絡行列・計画の生成ルールが理解を助けてくれる。連続量主体の実験でDSDがリーズナブルな実験計画であることが理解できる。ランダム化・繰り返し・ブロックの扱いによる解析結果の違い、欠測値処理という実務で良く起きるやっかいな問題への対処法、主効果の直交性と交絡最適性のトレードオフ、その歴史的な経緯など、有意義な内容が豊富である。分割実験（SPD）もわかりやすく書かれている。山武ハネウエル（現Azbil）でFA開発部長，理事研究開発本部長，理事品質保証推進本部長，アズビル金門参与，などを歴任したのち東林コンサルティングを設立．専門領域は生産データ解析による歩留まり改善や品質改善，市場不良予測・ロバスト設計・最適化設計・実験計画などの統計的問題解決全般，デザインレビュ―・根本原因分析手法（RCA）・ヒューマンエラーの未然防止・工程改善などの現場指導など．著書は『ネットビジネスの本質』　日科技連出版　2001（共著）【テレコム社会科学賞受賞】，『実践ベンチャー企業の成功戦略』　中央経済社　2011(共著)，『よくわかる「問題解決」の本』　日刊工業新聞社　2014(単著)．主な論文は「生産ラインのヒヤリハットや違和感に関する気づきの発信・受け止めを促進するワークショップの提案」品質管理学会 2016【2016年度品質技術賞受賞】．主な講演「作業ミスを誘発する組織要因を可視化し改善を促進する仕組みの提案」(Discovery-Japan 2018) 「JMPによる品質問題の解決～製造業の不良解析と信頼性予測～」(Discovery-Japan 2019)

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：初級樋口侑夏, 研究開発部, ユニテックフーズ株式会社浅野桃子, 研究開発部, ユニテックフーズ株式会社近年の傾向として日本人の米離れが進んでおり、代わりの主食としてパンの需要が増加している。当社はパンの食感改良を目的とした生地改良剤の開発を行っているが、パンの食感を言葉で他人と共有し同じ認識を持つことは、同一のテクスチャー用語でも個々人によってズレが生じてしまうため非常に困難であった。そこで、官能評価による食感評価で特徴を二次元的にマッピングすることができれば、視覚的に誰もが同じ認識をもつことができると考えた。本研究では、官能評価が容易な食パンとその応用であるメロンパンをモデルとし、統計解析や官能評価によりテクスチャー用語の選定とその定義付けを行い、官能評価系を確立した。これにより、人によって表現が異なっていたパンの特徴を共通の尺度で評価することが可能となった。また、市販品シェア上位5種の食感マッピングを作成し、その物性的特徴を可視化して示すことができた。【発表者概要】ペクチンをはじめとするハイドロコロイドや天然食品素材、機能性素材を、海外の素材メーカーから取りそろえ、国内食品メーカーに活用ノウハウを提供するユニテックフーズ株式会社で研究開発を行う。品質管理の効率化や商品開発の精度向上に携わる中で、JMPによる統計解析を活用している。

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：初級食品のおいしさに重要な因子の一つがテクスチャーである。ゼリーやプリンのようなゲル状食品のテクスチャーを付与するゲル化剤において、高分子多糖類の種類やその配合比率を調整することにより、多様な食感設計を可能にする。メーカーのニーズやトレンドに合わせたゲル化剤の開発および提案を行うために、官能評価が重要になる。しかしテクスチャーの捉え方には個人差があるため、評価用語を主観的に設定してしまうと、適切な用語の選出漏れが起こる可能性や、担当者によるバイアスが強くかかってしまう。また、評価基準が個人に依存してしまい、ゲル化剤の特徴の認識にも差異が生じる恐れがある。本研究では、多重対応分析を用いて客観的に評価用語を選定し、評価基準を標準化した。さらに官能評価の結果を多変量分析（主成分分析、クラスター分析）することでゲル化剤の有する食感の特徴を相対的に位置づけし、食感マッピングによって視覚的に共有化できた。ペクチンをはじめとするハイドロコロイドや天然食品素材、機能性素材を、海外の素材メーカーから取りそろえ、国内食品メーカーに活用ノウハウを提供するユニテックフーズ株式会社で研究開発を行う。統計解析を活用したデータ分析から、商品開発やコア技術の創生に携わる。

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：初級早崎将光, 主任研究員, エネルギ・環境研究部　環境評価グループ, 一般財団法人日本自動車研究所伊藤晃佳, グループ長・主任研究員, エネルギ・環境研究部　環境評価グループ, 一般財団法人日本自動車研究所我々の主要な研究テーマは、自動車交通と大気環境、ならびに大気環境と人への健康影響であり、自動車交通量は重要な情報の一つである。自動車交通量の指標の一つである断面交通量は、車両感知器などによる交通量の情報で、それぞれの地点における5分毎のデータが公開されている。現在、東京都内では約2,400ヵ所の断面交通量情報が公開されている。断面交通量は、比較的広い範囲における自動車交通量を、面的にとらえる指標として重要である。新型コロナウィルス（COVID-19）の感染拡大による緊急事態宣言によって、社会経済活動は大きく変化し、自動車交通にも影響があったと考えられる。今回我々は、緊急事態宣言期間の前後における東京都内の自動車交通量の変化を、断面交通量を指標として解析を行った。また、同期間における大気質の変化についても検討を行った。解析の主要なツールとしてjmpを用いた。jmpのテーブル結合、連結機能などのデータテーブル編集機能、データとリンクしているグラフ機能を用いることで、効率的に解析を実施することが出来た。本報告では、我々のjmp使用例について紹介をする。堺温哉愛媛大学大学院連合農学研究科博士課程修了（農学博士）学術振興会特別研究員（PD）、浜松医科大学（教務補佐員）、横浜市立大学医学部（助教）、信州大学医学部（特任助教）を経て2012年9月より一般財団法人日本自動車研究所に所属（主任研究員）、2020年4月より現職。現在の主要な研究テーマは、Traffic Related Air Pollution (TRAP) を対象とした大気環境疫学。早崎将光筑波大学大学院博士課程地球科学研究科を単位取得退学（2000年）．同大学生命環境科学研究科地球環境科学専攻にて博士（理学）取得（2006年）．国立環境研究所，千葉大学環境リモートセンシング研究センター，富山大学，九州大学，東京大学大気海洋研究所での勤務（PD，特任研究員など）を経て，2017年より現職．主な研究テーマは，高濃度大気汚染事象の要因解明など．伊藤晃佳 2002年3月北海道大学大学院工学研究科環境資源工学専攻博士後期課程修了，博士（工学）．2002年4月より一般財団法人日本自動車研究所に所属し，2010年より現職．近年の主要な活動として，大気環境に対する発生源寄与度の評価，大気観測結果（常時監視局等）を用いた解析，大気シミュレーション（CMAQ等）を用いた解析等が挙げられる． ※配布資料はございません。

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：初級大学院でSQLに詳しくない者がビッグデータを扱う必要が生じたので標記の内容を試みた。今回は、大学院生の情報関係のスキルアップも目的とし、Buffalo社のNASであるLinkstation410のケースのみを入手し、LinuxのDebian 10.5をインストールした。その後、NAS上にMariaDB(MYSQL互換品)を設定し、Windows PCにあったACCESSベースの救急救命関係の5000万件規模のデータをMariaDBに移送した。最終的にPC上のJMP Pro15.1.0のクエリビルダーからNAS上のMariaDBを接続して解析を行っている。本手法は、NASが小型(640g)なため、テレワークの大学院生にデータサーバー環境を宅配で支給するのも可能である。本報告では、NASのケースのみからMYSQLサーバーを組み上げJMPと連携させるまでの、技術的ノウハウと運用上の留意点について報告する。 1980年３月慶応義塾大学工学研究科電気専攻修了、工学修士。 1993年6月東邦大学医学部、医学博士。現在、国士舘大学大学院救急システム研究科教授。受賞歴：SAS Users' Group International Japan功績賞（1999), SAS ユーザー総会ポスター賞(2011)、ヨーロッパ蘇生協議会, Best of the Best Abstract(2010,2018)

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：初級発表タイトル：離職による医療崩壊を防ぐために！　医療者の離職原因および転職前後の満足度変化に関する解析医療者の離職や病院経営の悪化等を受けて、安定的・継続的な医療提供体制が成り立たなくなることを医療崩壊という。これまで我々は離職による医療崩壊を防ぐために、離職に至る潜在要因の解析を行ってきた。そこで今回は離職経験のある医師・看護師・病院薬剤師を対象に、離職に至った顕在要因ならびに転職による職務満足度の変化（満足度変化率）について調査し、職種間の比較等を行った。調査はWeb調査会社にモニター登録された医療者を対象にアンケートを実施した。職種間および年代間の離職理由を比較するため対応分析から二次元付置図を作図したところ男性病院薬剤師（37歳未満）と男性医師（37歳未満）は、給与やキャリアアップの考え方について同一方向に付置し、女性病院薬剤師（37歳未満）と女性医師（37歳未満）においては、結婚・子育てについて同一方向に付置した。次いで転職による職務満足度の変化をパーティションにより分析した結果、病院薬剤師では、子供なし、年収高め、37歳未満、未婚者ほど転職により逆に職務満足度が下がった傾向にあった。転職による職務満足度低下に関連する因子を見える化すれば、少しでも離職を思いとどまらせられるかもしれない。【略歴】H5日本医科大学付属病院薬剤部，H16日本大学薬学部，H25帝京平成大学薬学部薬学科・大学院薬学研究科教授，H26信州大学医学部附属病院臨床試験センター特任教授（兼務）【学会・団体】日本薬学会代議員，日本医療薬学会代議員，日本クリニカルパス学会評議員・広報委員長，都病薬臨床研究専門薬剤師養成委員会副委員長，神奈川県病薬特別委員【他所属】東京都健康長寿医療センター倫理審査委員・認定臨床研究審査委員，新渡戸記念中野総合病院IRB委員【資格等】日本医療薬学会指導薬剤師，鍼灸師，診療情報管理士【趣味】林道サイクリング・キャンプ料理・スイミング・バードフォト

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：中級データ解析を行う前に，必要なデータの抽出，変数間の対応付けの変更・整形，変数変換・カテゴリ化・再カテゴリ化等を行って，解析用データセットを作成する必要がある。JMPには行や列の抽出，結合等のデータベース操作，変数変換等に必要な計算関数が用意されている。これらの機能を用いて解析用データセットは容易に作成できる。しかし，抽出対象の設定や変数変換などの操作命令は解析者が指示する必要があり，操作命令が複雑なったとき，意図した結果が得られていない可能性が高くなる。例えば，データ抽出における範囲設定やif文によるカテゴリ化の際，”and”，”or”ルールが複雑になればなるほど，所望の解析用データセットが得られていない可能性が高まる。そこで，解析用データセットが解析者の意図したものに一致しているかを機械的に調べる必要がある。 JMPの統計的方法によって，解析用データセットの質を確認することができる。ある変数の最大値や最小値を求める方法は最も簡単なものであるが，「1変数の分布」，「2変数の関係」も強力であり，「2変数の関係」において寄与率1がエビデンスである。本発表では大規模データに対して解析用データセットの質を確かめた事例を報告する。東京理科大学理工学部経営工学科講師。東京大学大学院化学システム工学専攻主幹研究員。研究専門分野は統計的品質管理。主に品質管理に必要な統計解析法について研究しているが，統計的品質管理の防火建築，火災現象，医療や介護への応用も行っており，JMPを用いて大規模データをモデリングして背後に潜む情報を抽出し，研究対象となる固有の分野へフィードバックしている。

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：初級近年、企業経営において日々蓄積されるデータを分析・可視化し、戦略策定や意思決定に役立てるビジネスインテリジェンス（Business Intelligence）やピープルアナリティクス（People Analytics）が注目を集めている。本発表では、従業員や組織に関する調査データをJMPによって解析し、その結果を経営の意思決定のためのコンサルティング提案に活用する事例について報告する。企業の経営コンサルティング活動において、組織の実態を把握するために行われる定量・定性の組織調査は欠かせないものである。従業員一人ひとりの成長によってもたらされる組織の持続的な成長を実現するには、これらの調査データから組織の状態を可視化し、将来の予測や意思決定に活用できることが望ましい。本事例では、A社で取得したデータに対し、JMPの多変量解析機能および解析模型図や構造模型図という可視化ツールを用いた方法論を適用する。その結果、A社の経営層に向け、わかりやすい提案を行うことが可能になる。本発表では、記述統計を用いた一般的な分析から一歩進んだ解析手法について、データ取得から提案までの一連の流れを紹介する。マーケティング関連会社、EAP（Employee Assistance Program：従業員支援プログラム）サービスを提供するプロバイダー、ベンチャー企業勤務を経て、組織人事コンサルタントとして独立。企業の組織・人材開発の業務に携わりながら、社会人大学院生として博士課程に進学し、質問紙調査・質問紙実験に基づく解析と設計をテーマとした研究に取り組む。修了後も引き続き社会科学領域のテーマを中心に企業実務と研究を両輪で実践し、現在は桜美林大学ビジネスマネジメント学群特任講師、NPO法人GEWEL理事、FREELY合同会社代表として理論開発と開発した理論の実務への適用を進めている。http://researchmap.jp/sho-kawasaki/ 高橋武則 50年近くに亘りQM（質経営），SQM（統計的質経営）および設計論の研究を行ってきた．21世紀に入ってからは設計パラダイムである超設計（Hyper Design）を提案し，その数理であるHOPE理論を開発しその支援ソフトHOPE－Add-inをSAS社との共同開発行っている．考え方である超設計，統計数理であるHOPE理論，支援ツールであるHOPE－Add-in for JMPの三位一体で新しい設計法を実現している．そしてこの理論の社会科学的延長線上で多群主成分回帰分析を提案している．橘雅恵社労士事務所を開業以来、人事制度構築に注力。サポート企業は80社以上。各社に最適な制度構築は、社員インタビューや社員アンケートを使った組織風土診断・賃金分析が不可避であると考えている。経営全般をサポートする専門家集団ジャパンコンサルティングファームを設立し、経験や勘だけではなく、データに基づいて因果関係を見つけ出し精度が高い経営課題を抽出し、企業の業績アップ、組織開発を提案できるチームを目指している。

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：初級　実務での実験は多額の費用と長い時間と多大の労力が必要なために，確実に成果を挙げることが求められる．このためには以下のことを体験的に理解しかつ実行できる包括的な対応能力を身に着ける必要がある． ①誤差のばらつきを小さくすることの重要性を知る．このためには誤差のばらつきがどう影響するかを知る必要がある． ②確実に効いている可能性のある因子をいくつか取り上げる．このためにはスクリーニング実験を活用する． ③モデル（模型，関数）には不足の項がないようにする．このためにはLOF（不適合）のないモデリング実験が必要である． ④設計の本質は数理計画法を用いた最適化にほかならない．このためには優れた機能を有した使いやすいソフトが不可欠である． ⑤統計処理が正しくてもばらつきの影響で解はしばしば目標値からずれる．このためには事後に回帰修正が必要である．上記の内容を短時間で安全にかつ納得のいく形で習得するには仮想教材を用いた体験型教育が不可欠である．　本研究は飛球シミュレーターを用いた包括的な実践的実験計画法の教育カリキュラムを提案する．計画立案，実験実施，データ解析，設計（最適化），回帰修正等について具体的に紹介する．大学で応用物理を学び、大手制御機器メーカー入社後は自社製品に伴う半導体デバイスの研究開発、生産の要素技術開発から製造ライン構築、顧客品質保証からISO認証取得等、生産に関しては基礎から応用まで全てを担当して参りました。その後大学院にて統計的品質管理 (SQC) に基づく経営視点の最適化を研究し経営学の学位 (博士) を取得しました。現在は経営視点の中に情報通信技術の進歩に伴うデータマネジメントをより強く意識するようになりました。計測技術がインターネットと融合してIoTに進化したように、SQCを推進した技術者はデータサイエンティストとして活躍する時代になりました。JMPは2003年から業務に活用し始め、そのポテンシャルの高さをすぐに実感しました。JMPer’s Meetingでは実験計画法や最適化について発表し、Discovery Summit Japan にも2016年から口頭ならびにて発表しております。 ※さらに詳細な理解を希望される方には、論文をご用意しております。ご希望の方は、発表者である小川様まで直接ご連絡ください。

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：初級製薬業界の非臨床試験では、薬剤の用量を複数設定した試験での効果を仮説検定で証明する場面が多く、第1種の過誤を厳密に制御するために多重比較が必須となっている。とりわけ、対照群と複数の薬剤用量群との比較のためにDunnett多重比較法が汎用されているが、対照群と各用量の比較を一斉に行うために群数を多く設定した試験では検出力が低下する問題点がある。 Williams多重比較法は、薬剤効果に用量依存性（単調性）がある場合に適用し、高用量群から閉手順で逐次検定を実施するため、2群のt検定と比較してもほとんど検出力が低下しない魅力的な多重比較手法であるが、残念ながらJMPには現時点で搭載されていない。そこで、JSLを用いてWilliamsの方法に基づき高用量群から閉手順で逐次検定を実施し、算出した統計量を成書のWilliams多重比較用の統計数値表と比較して有意差の有無を出力するアドイン開発を進めている。また、ノンパラメトリック版であるShirley-Williams多重比較法も搭載を計画している。発表ではプロトタイプのデモを含めてアドインの機能を紹介する。株式会社タクミインフォメーションテクノロジーにて昨年より製薬関連の非臨床向けビジネスを担当しているシステムエンジニア。非臨床部門向けの統計解析ソフトウェアの開発および統計セミナー講師・コンサルティングに従事している。製薬企業の薬理部門にてin vitroおよびin vivoの薬効薬理試験に長年従事し、製造販売承認申請を経験。2007年にグループ企業の研究管理部門に転籍し非臨床の統計解析を担当した際に、芳賀敏郎先生・高橋行雄先生等にJMPを教えていただき、現在に至る。最近はJMPのスクリプト言語（JSL）を用いた開発を主に担当しており、JMP機能を拡張する非臨床向けアドインのシリーズ化を計画している。

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：中級おそらくJMPは信頼性データ分析のソフトウェアの中では最も強力な能力と体系的なアプリケーションを持っているソフトウェアである。本報告では，JMPの信頼性/生存時間分析のプラットフォームを使って、一変量の分布、二変量の関係、予測、モデル化と許される時間の中で寿命データの分析方法を体系的に紹介する。特にモデル化では再生定理や信頼性試験で使われる方法を実例から飛躍しない程度の仮想的な例を通じて信頼性活動とデータ分析プロセスを紹介する予定である。廣野元久 1984年、株式会社リコー入社。以来、社内の品質マネジメント・信頼性管理の業務、統計学の啓発普及に従事。品質本部QM推進室室長、SF事業センター所長を経て、現職。東京理科大学工学部(1997-1998)、慶應義塾大学総合政策学部　非常勤講師(2000-2004)。主な専門分野は統計的品質管理、信頼性工学。主著に「グラフィカルモデリングの実際」、「JMPによる多変量データの活用術」、「アンスコム的な数値例で学ぶ統計的計算方法23講」、「JMPによる技術者のための多変量解析」、「目からウロコの多変量解析」などがある。遠藤幸一 1987年株式会社東芝に入社。パワーIC(電源用IC、モーター用500V耐圧ドライバIC等)の製品開発・プロセス開発を経て、現在は故障解析技術開発に従事。博士(情報科学) 大阪大学。 ※配布資料はございません。

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：中級信頼性工学において使われるワイブル確率紙を用いるワイブル解析の話を中心に、信頼性データ分析のケーススタディを紹介する。信頼性データは処理によっては誤った解釈をしてしまう場合が少なくない。本発表では昨年に引き続きJMPを使ったデータ分析の正しい使い方のツボを既存のやり方と比較しながら、デモを交えて紹介する。廣野元久 1984年、株式会社リコー入社。以来、社内の品質マネジメント・信頼性管理の業務、統計学の啓発普及に従事。品質本部QM推進室室長、SF事業センター所長を経て、現職。東京理科大学工学部(1997-1998)、慶應義塾大学総合政策学部　非常勤講師(2000-2004)。主な専門分野は統計的品質管理、信頼性工学。主著に「グラフィカルモデリングの実際」、「JMPによる多変量データの活用術」、「アンスコム的な数値例で学ぶ統計的計算方法23講」、「JMPによる技術者のための多変量解析」、「目からウロコの多変量解析」などがある。遠藤幸一 1987年株式会社東芝に入社。パワーIC(電源用IC、モーター用500V耐圧ドライバIC等)の製品開発・プロセス開発を経て、現在は故障解析技術開発に従事。博士(情報科学) 大阪大学。 ※配布資料はございません。

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：中級 JMP (Pro)を使えばR , Pythonなどに較べて手軽に分析を楽しめます。フルオーダーメイドの分析とはいきませんが、セミオーダーには十分に対応が可能です。JMPを使えば以下のようなことが簡単に実行できます。 ①コマンドを打ちこまなくてもマウス１つで分析が可能に、②グラフと統計量のセット、③分析プロセスをスクリプトに残せる、④分析プロセスの流れに沿ったレポートの出力が可能、⑤統計的な思想が基本にあるから体系的な理解と学習に最適、など。本報告では数値例を使ってJMPでできる予測や分類の話をします。扱う方法はカーネル平滑化、SVMやニューロ判別などです。また、従来の統計的な多変量解析との対比も行い理解を深めます。 1984年、株式会社リコー入社。以来、社内の品質マネジメント・信頼性管理の業務、統計学の啓発普及に従事。品質本部QM推進室室長、SF事業センター所長を経て、現職。東京理科大学工学部(1997-1998)、慶應義塾大学総合政策学部　非常勤講師(2000-2004)。主な専門分野は統計的品質管理、信頼性工学。主著に「グラフィカルモデリングの実際」、「JMPによる多変量データの活用術」、「アンスコム的な数値例で学ぶ統計的計算方法23講」、「JMPによる技術者のための多変量解析」、「目からウロコの多変量解析」などがある。 ※配布資料はございません。

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：上級 2017年Discovery Summit Japanで「JMPで自在に面形状をつくる」と題した，面内分布の設計手法について発表した．その際，今後の課題として高次多項式モデルへの拡張をあげた．本発表では，JMP Proに実装されている関数データエクスプローラを用いて，更に高度な面内分布の設計手法について紹介する．2017年の発表の一部も併せて紹介することで，面内分布の設計手法についても概観する．統計的問題解決コンサルタント兼JMPトレーナーとして，実験計画をはじめ技術系のデータ分析全般を指導している．2020年より，フリーランスとしてより広範囲な技術分野に業務展開し，統計教育・事例コンサルテーションへと活動の場を広げている．著書として『JMPではじめる統計的問題解決入門』『JMPではじめるデータサイエンス』（ともにオーム社）がある．履歴：株式会社東芝の半導体研究開発部門で，画像処理をメインに計測技術開発に従事する．足掛け8年の米国赴任中に，計測データを有効に活用するためには統計処理が必須であることを実感し，イノベーション推進部に異動後はその経験を踏まえ社内でのデータサイエンス普及にも注力する． ※配布資料はございません。

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：中級働き方改革が叫ばれる昨今、実験と解析の効率化の重要性が高まっている。JMPによる実験効率化の威力を示すには、実験データを決定的スクリーニング計画（DSD）やカスタム計画で置き換えて見せて実験数を大幅に削減できることを示すと良い。その際の応答データは既存実験データから拾い出す。複数の表に分けられた実験データを見つけた時は、一つのテーブルにまとめて多変量解析を行い、プロファイルで可視化して見せて、OFAT(One Factor at a Time)的方法の落し穴に気づいてもらう。繰り返しのある実験データを平均で分析する考え方に対しては、積み重ね処理や平均・分散による多目的最適化やロバスト最適化の方法があることを示す。開発現場で実験計画法を使う場合は交互作用の存在を予測できないことが多く、しかも交互作用は決して稀なことではない。DSDは主効果と交互作用(2FI)の交絡や２FI間の交絡がなく、実験数が因子数の２倍程度の少ない数で済む。これは大きな利点である。実際にDSDを使って分かったこと、主効果数＋交互作用項数が因子数に近づく時に起きる破綻、拡張計画による解決方法、JMPコミュニティやASQから入手したDSD関連論文の中で実務上重要と思われる点、などについて報告する。山武ハネウエル（現Azbil）でFA開発部長，理事研究開発本部長，理事品質保証推進本部長，アズビル金門参与，などを歴任したのち東林コンサルティングを設立．専門領域は生産データ解析による歩留まり改善や品質改善，市場不良予測・ロバスト設計・最適化設計・実験計画などの統計的問題解決全般，デザインレビュ―・根本原因分析手法（RCA）・ヒューマンエラーの未然防止・工程改善などの現場指導など．著書は『ネットビジネスの本質』　日科技連出版　2001（共著）【テレコム社会科学賞受賞】，『実践ベンチャー企業の成功戦略』　中央経済社　2011(共著)，『よくわかる「問題解決」の本』　日刊工業新聞社　2014(単著)．主な論文は「生産ラインのヒヤリハットや違和感に関する気づきの発信・受け止めを促進するワークショップの提案」品質管理学会 2016【2016年度　品質技術賞受賞】．主な講演「作業ミスを誘発する組織要因を可視化し改善を促進する仕組みの提案」(Discovery-Japan 2018) 「JMPによる品質問題の解決～製造業の不良解析と信頼性予測～」(Discovery-Japan 2019)

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：中級特定検診は、生活習慣病予防の観点から、40歳から74歳を対象にメタボリックシンドローム(通称：メタボ)の該当者を減少させることを目的としている。特定検診受診者全体に対する検診結果の要約は、個人と全体を比較するベンチマークとなり得るため、中高年個々人の健康管理に対して参考になると思われる。厚生労働省が提供するNDBオープンデータでは、特定検診の情報として、年度ごとに検査項目（腹囲、血糖値、血圧など）の平均値や階級別分布を入手することができ、メタボの判断基準となるいくつかの検査項目に対し、性別、年代などの属性ごとに基準外の人数、検査人数を求めることができる。属性ごとに各検査項目に対する基準外の割合をグラフ化してみると、検査項目によっては年代による傾向が表れないなど興味深い結果が得られる。本発表ではこれらグラフ化とともに、各検査項目に対する基準外の割合に対し、年度、都道府県、性別、年代を要因とした一般化線形モデルをあてはめた結果を示す。このモデル化により、対象者の属性（性別、年代、居住している都道府県）における基準外の割合を予測することができ、特定検診受診者全体を深く理解できる。 JMPジャパン事業部の技術エンジニア。現在は主に製薬会社、食品会社を対象としたJMP製品のプリセールス業務を行っている。JMPをお客様に紹介する立場ではあるが、自身も一人のJMPユーザであるという意識が強い。近年はメディア等で話題となる事柄について、JMPで分析した結果をブログや分析レポートの共有サイトである「JMP Public」に投稿している。 https://public.jmp.com/users/259

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：初級最近のJMPには、簡単な計算式であればメニューからマウス操作だけで新規列として作成する機能が用意されています。今回は初心に戻り、ゼロから計算式を列に設定するための計算式エディタの使い方について説明します。 ※このセッションでは、配布資料はございません。 2001年にJMPジャパン事業部に転属し、テクニカルサポートに従事。 2010年からは電気・電子・半導体関連業界のセールスエンジニアを担当する。

0 attendees

0

Event has ended

0 attendees

0

Friday, December 18, 2020

レベル：初級 2020年4月1日から医薬品の承認申請に電子データの提出が求められるようになりました。本講演ではまず、申請に用いられるCDISCデータ標準について紹介します。その後、CDISC準拠データの解析に強みを持つJMP Clinicalについて、デモを交えて紹介します。 SAS Institute Japan株式会社JMPジャパン事業部のアカデミック向け営業チームでSEを担当。前職ではCRO及び製薬企業で臨床試験の統計解析業務に従事。理学修士（数学）。

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

Caleb King, JMP Research Statistician Tester, SAS In this talk, we illustrate how you can use the wide array of graphical features in JMP, including new capabilities in JMP 16, to help tell the story of your data. Using the FBI database on reported hate crimes from 1991-2019, we’ll demonstrate how key tools in JMP’s graphical toolbox, such as graphlets and interactive feature modification, can lead viewers to new insights. But the fun doesn’t stop at graphs. We’ll also show how you can let your audience “choose their own adventure” by creating table scripts to subset your data into smaller data sets, each with their own graphs ready to provide new perspectives on the overall narrative. And don’t worry. Not all is as dark as it seems... Auto-generated transcript... Speaker Transcript Caleb King Hello, my name is Caleb King, I am a developer at the at the JMP software. I'm specifically in the design of experiments group, but today I'm going to be a little off topic and talk about how you can use the graphics and some of the other tools in JMP to help with sort of the visual storytelling of your data. Now the data set I've chosen is a...to illustrate that is the hate crime data collected by the FBI so maybe a bit of a controversial data set but also pretty relevant to what what's been happening. And my goal is to make this more illustrative so I'll be walking through a couple graphics that I've created previously for this data set. And I won't necessarily be showing you how I made the graphs but the purpose is to kind of illustrate for you how you can use JMP for, like I said, visual storytelling so use the interactivity to help lead the people looking at the graphs and interacting with it to maybe ask more questions, maybe you'll be answering some of their questions, as you go along. But kind of encourage that data exploration, which is what we're all about here at JMP. So with that said let's get right in. I'll first kind of give you a little bit of overview of the data set itself, so I'll kind of just scroll along here. So there's a lot of information about where the incidents took place. As we keep going, and the date, well, when that incident occurred. You have a little bit of information about the offenders and what offense, type of offense was committed. Again some basic information, what type of bias was presented in the incident. Some information about the the victims. And overall discrimination category, and then some additional information I provided about latitude and longitude, if it's available, as well as some population that I'll be using other graphics. Now just for the sake of, you know, to be clear, the FBI, that's the United States Federal Bureau of Investigation, defines a hate crime is any criminal offense that takes place that's motivated by biases against a particular group. So that bias could be racial, against religion, gender, gender identity and so forth. So, as long as a crime is motivated by a particular bias, it's considered a hate crime, and this data consists of all the incidents that have been collected by the FBI, going back all the way to the year 1991 and as recent as 2019. I don't have data from 2020, as interesting as that certainly would be, but that's because the FBI likes to take its time and making sure the data is thoroughly cleaned and prepared before they actually create their reports. So you can rest assured that this data is pretty accurate, given how much time and effort they put into making sure it is. Alright, so with that let's kind of get started and do some basic visual summary of the data. So I'll start by running this little table script up here. And all this does is basically give us a count over over the days, so each day, how many incidents occurred, according to a particular bias. From this I'm going to create a basic plot, in this case it's simple sort of line plot here, showing the number of incidents that happen each day over the entire range. So you get the entire...you can see the whole range of the data 1991 to 2019 and how many incidents occurred. Now this in and of itself would probably a good static image, because you get kind of get a sense of where the the number of incidents falls. In fact here I'm going to change the axis settings here a little bit. Let's see, we got increments in 50s, let's do it by 20s. There we go. So there's a little bit of interactivity for you... interaction. We changed the scales to kind of refine it and get a better sense of how many incidents, on average, there are. I ran a bit of a distribution and, on average, around 20 incidents per day that we see here. Now of course, you're probably wondering why I have not yet addressed the two spikes that we see in the data. So yes, there are clearly two really tall spikes. And so, if this were any other type of software, you might say, okay, I'd look like to look into that. So you go back to the data you try and, you know, isolate where those dates are and maybe try and present some plots or do some analysis to show what's going on there. Well, this is JMP and we have something that can help with that, and it's something that we introduced in JMP 15 called graphlets and it works like this. I'm just going to hover and boom. A little graphlet has appeared to help further summarize what's going on at that point. So in this case there's a lot of information. We'll notice first the date, May 1, 1992. So if you're familiar with American history, you might know what's happening here, but if not, you can get a little bit of an additional clue by clicking on the graph. So now you'll see that I'm showing you the incidents by the particular bias of the incident. So we see here that most of the incidents were against white individuals and then next group is Black or African-American and it continues on down. I kind of give away the answer here, in that the incidents that occurred around this time where the Rodney King riots in California. Rodney King, an African-American individual who was unfortunately slain by a police officer and that led to a lot of backlash and rioting around this time. So that's what we're seeing captured in this particular data point, and if you didn't know that, you would at least have this information to try to start and go looking there...looking online to figure out what happened. We can do the same thing here with a very large spike. And again, I'll use the hover graphlet, so hover over it. I'll pause to let you look. So we look at the date, September 12, 2001. That's in it of itself a very big clue as to what happened. But if we look here at the breakdown, we can see that most of the incidents were against individuals of Muslim faith, of Arab ethnicity or some other type of similar ethnicity or ancestry. In this case, we can clearly see that, after the unfortunate events of September 11, the terrorist attacks that occurred then, there was on the following day, a lot of backlash against members who were of similar ethnicity, similar faith and so forth, so we had an unfortunate backlash happening at that time. So already with just this one plot and some of that interactivity, we've been able to glean a lot of information, a lot of high level information in areas where you might want to look further. But we can keep going. Now something new in JMP 16 is, because we have date here on the X axis, we can actually bin the dates into a larger category, so in this case let's bin it by month. And we see that the plot disappears. So here's what I'm going to do. I'm going to rerun it and let's see. There we go. You never know what will happen. In this case, so this is what's supposed to happen; don't worry. So we've binned it by month and we noticed an interesting pattern here. There seems to be some sort of seasonal trend occurring, and let's use the hover graphlets to kind of help us identify what might be happening. So I'm going to hover over the lower points. So if I do that, we see okay, January, December, okay. Interesting. Let's hover over another one, December. And yet another one, December. Ah, there might actually be some actual seasonal trend in this case going on. We seem to hit low points around the the winter months. And in fact, if I go back to my data table, I've actually seen that before. It was something I kind of discovered while exploring that technique and I've already created a plot to kind of address that. So this was something I created based off of that, kind of, look at, you know, what's the variation in the number of incidents over all the years within this month. And here we can see them the mean trends, but we also see a lot of variation, especially here in September because of that huge spike there. So maybe we need something a little more appropriate. So I'll open the control panel and hey, let's pick the the median. That's more robust and maybe look at the interquartile range, so that way we have a little bit more robust metrics to play with. And so, again, we see that seasonal trends, so it seems that there's definitely a large dip within the winter months as opposed to peaking kind of in the spring and summer months. Now this might be something someone might want to look further into and research why is that happening. You might have your own explanations. My personal explanation is that I believe the Christmas spirit is so powerful that overcomes whatever whatever hate or bias individuals might have in December. Again that's just my personal preference, you probably have your own. But again, with just a single plot, I was able to discover this trend and make another plot to kind of explore that further. So again with just this one plot, I've encouraged more research. And we can keep going. So let's see, let's bin it by year, and if we do that, we can clearly see this kind of overall trend. So we see a kind of peak in the late '90s around early 2000s before dropping, you know, almost linear fashion, until it hits its midpoint about in the mid 2010s before starting to rise again. So keep that in mind, you might see similar trends in other plots we show. But again, let's take a step back and just realize that in this one plot we've seen different aspects of the data. We even answered some questions, but we've also maybe brought up a few more. What's with that seasonal trend? And if you didn't know what those events were that I told you, you know, what were those particular events? So that's the beauty of the interactivity of JMP graphics is it allows the user to engage and explore and encourages it all within just one particular medium. All right. Let's keep going. So I mentioned, this is sort of visual storytelling, so you can think of that sort of as a prologue, as sort of the the overall view. What's...what's what's the overall situation? Now let's look at kind of the actors, that is, who's committing these types of offenses? Who are they committing them against? What information can we find out about that? So here I've created, again, a plot to kind of help answer that. Now this might be a good start. Here I've created a heat map that then emphasizes the the counts by, in this case, the offender's race versus their particular bias. So we see that a lot of what's happening, in this case I've sorted the columns so we can see there's a there's a lot going on. Most of its here in this upper left corner and not too much going on down here, which I guess is good news. There's a lot of biases where there's not a lot happening, most of it's happening here in this corner. Now, this might be a good plot, but again there's a lot of open space here. So maybe we can play around with things to try and emphasize what's going on. So one way I can do that is I'll come here to the X axis and I'm going to size by the count. Now you'll see here, I had something hidden behind the scenes. I'd actually put a label, a percentage label on top of these. There was just so much going on before that you couldn't even see it, but now we can actually see some of that information. So kind of a nice way to summarize it as opposed to counts. But even with just the visualization, we can clearly see the largest amount of bias is against Black or African American citizens and then Jewish and on down until there's hardly any down here. So just by looking at the X axis, that gives you a lot of information about what's going on. We can do the same with the Y, so again, size by the count. And again, there's a lot of information contained just within the size and how I've adjusted the the axes. And this case we include...we've really emphasized that corner, so we can clearly see who the top players are. In this case, most of it is offenders are of white or unknown race against African-Americans, the next one being against Jewish, and then anti white and then it just keeps dropping down. So we get a nice little summary here of what's going on. Now, you may have noticed that as I'm hovering around, we see our little circle. That's my graphlet indicator, so again I've got a tool here. We've we've interacted a little bit and again, this could be a great static image, but let's use the power of JMP, especially those graphslets, to interact and see what further information we can figure out. So in this case, I'll hover over here. And right here, a graph, in this case, a packed bar chart, courtesy of our graph guru Xan Gregg. In this case, not only can you see, you know, what people are committing the offenses and against whom, your next question might have been, you know, what types offenses are being committed? Well, with a graphlet, I've answered that for you. We can see here the largest...the overwhelming type of offense is intimidation, followed by simple and aggravated assault, and then the rest of these, that's the beauty of the packed bar chart. We can see all the other types offenses that are committed. If you stack them all on top of each other, they don't even compare. They don't even break the top three. So that tells you a lot about the types of...these types of offenses, how dominant they are. Now, another question you might have is, okay, we've seen the actors, we've seen the actions they're taking, but there's a time aspect to this. Obviously this is happening over time, so has this been a consistent thing? Has there been a change in the trends? Well, have no fear. Graphlets again to the rescue. In this case, I can actually show you those trends. So here we can see how has the types of...the number of intimidation incidents changed over time? And again, we see that the pattern seems to follow what that overall trend was. A peak in the like, late 90s, and then the steep trend...almost linear drop until about the mid 2010s, before kind of upticking again more recently. And again we can maybe see that trend and others. I won't click to zoom in, but you can just see from the plot here, those trends in simple assault here and aggravated assault as well, a little bit there. And you can keep exploring. So let's look at the unknown against African-Americans and see what difference there might be there. In this case, we can clearly see that there are two types of offenses that really dominate, in this case, destruction or damage to property (which, if you think about it, might make sense; if you see your property's been damaged, there's a good chance you may not know who did it) and intimidation are the dominant ones. And again, you can...the nice thing about this is the hover labels kind of persist, so you can again look and see what trends are happening there. So in this case, we see with damage, there's actually two peaks, kind of peaked here in the late '90s early 2000s, before dropping again. And with intimidation, we see a similar trend as we did before. Again within just one graphic, there's a lot of information contained and that you, as the user, can interact with to try and emphasize certain key areas, and then you, as the user, just visualize...just looking at this and interacting with it, can play around and glean a lot of information. All right. And let's keep going. Now you'll notice that amongst the reporting agencies, so, most of them are city/county level police departments and so forth, but there's also some universities in here. So there might be someone out there who might be interested in seeing, you know, what's happening at the universities. And so, with that, I've created this nice little table script to answer that. Now this time, I've been just running the table scripts and I mentioned, I won't go too much behind the scenes, this is more illustrative. Here I'm going to let you take a peek, because I want to not only show you the power of the graphics but also the power of the table script. Now if you're familiar with JMP, you might know, okay, the table script's nice because I can save my analyses, I can save my reports, I can even use it to save graphics like I...like I did in the last one, so you may not have noticed that you can also save scripts to help run additional tables and summary tables and so forth. So let me show you what all is happening behind here, in fact, when I ran the script, I actually created two data tables. You only saw the one, so in this case I first created the data table that selected all the universities and then from that data table it created a summary and then I close the old one. And then I also added to that some of the graphics. So I won't go into too much detail here about how I set this up, because I want to save that for after the next one. I'll give you a hint. It's based off of a new feature in JMP 16 that will really amaze you. All right, let's go back to...excuse me...university incidents. And here again I've saved the table script. This one that will show us a graphic. So here we can see again is that packed bar chart, and here I'm kind of showing you which universities had the most incidents. Now again, this in and of itself might be a pretty good standard graphic. You can see that, you know, which university seem to have the most incidents happening and again it's kind of nice to see that there's no real dominating one. You can still pack the other universities on top of them, and nobody is dominating one or the other. So that in and of itself is kind of good news, but again there's a time aspect to this. So have these been maybe... has the University of Michigan Ann Arbor, have they had trouble the entire time? Have they...would they have always remained on top? Did they just happen to have a bad year? Again, graphlets to the rescue. In this case, you'll see an interesting plot here. You might say, you know, what what is this thing? This looks like it belongs in an Art Deco museum. What... what kind of plot is this? Well, it's actually one we've seen before. I'm just using something new that came out in JMP 16, so I'm going to give you a behind the scenes look. And in this case, we can see, this is actually a heat map. All I've done is I do a trick that I often like to do, which is to emphasize things two different ways, so not only emphasizing the counts by color, which is what you would typically do in a heat map, the whites are the missing entries, I can also now in JMP 16 emphasize by size. And so I think this again gets back to where we size those axes before. It emphasizes...helps emphasize certain areas. So here we can see now maybe there's a little bit of an issue with incidents against African-Americans, that has been pretty consistent, with an especially bad year in apparently 2017, as opposed to all of the other incidents that have been occurring. Now there's no extra hover labels here. All I'll do is summarize the data, but that's okay. This in and of itself gives you a lot of information, so this is a new thing that came out in JMP 16 that can again help with that emphasis. And again, we can keep going. We can look at other universities, so here, this might be an example of a university where they seem to have a pretty bad period of time, the University of Maryland in College Park, but then there was an area where things were really good, and so you might be interested in knowing, well, what happened to make this such a great period? Is there something the university instituted, what they did that seemed to cause the count, the number of incidents to drop significantly? That might be something worth looking into. And you can keep going and looking again to see whether it's a systemic issue, whether like, in this case, it seemed there's just a really bad year that dominated, overall they were just doing okay. They were doing pretty good. Again, this might be another one. They had a really bad time early on, but recently they've been doing pretty good, and so forth. So again, kind of highlighting that interactivity yet again, and in this case, with some of the newer features in JMP 16. Now, before we transition to the last one, I have a confession. I'm a bit of a map nerd, so I really like maps and any type of data analysis that, you know, relates to maps. I don't know why. I just really like it and so I'm really excited to show you this next one, because now we look at the geography of the incidents. But I'm also excited because this really, I believe, highlights the power of both the table scripts and the JMP graphics, especially the hover graphs. So hopefully that got you excited as well, so let's run it. Now this one's going to take a little while because there's actually a lot going on with this table script. It's creating a new table. It's also doing a lot of functions in that table and computing a lot of things. So here we've got not just, you know, pulling in information but also there's a lot of these columns here near the end that have been calculated behind the scenes. Now I have to take a brief moment to talk about a particular metric I'm going to be using. So a while back, I wrote a blog entry called the Crime Rate Conundrum on on the JMP Community (community.jmp.com), so shameless plug there, but in that I talked about how, you know, typically when you're reporting incidents, especially crime incidents, usually we kind of know that you don't just want to report the raw counts, because there might be a certain area where it has a high number of counts, a high number of incidents, is that just because that's...there's a problem at problem there? Or is it because there's just a lot of people there? And so we, of course, would expect a lot of incidents because there's just a lot of people. So of course people report incidents rates. Now that's fine because everybody's now on a level playing field but one side effect of that is it tends to elevate places that have small populations. Essentially, you have, if you have small denominator, you will tend to have a larger ratio just because of that. And so that's sort of an unfortunate side effect, and so there, I talk about an interesting case where we have a place with a really small population that gets really inflated. And how some people deal with that. One way I tried to address that was through this use of a weighted incident rate, essentially, the idea is I take your incident rate, but then I weight you by, sort of, what proportion... excuse me...basically a weight by how many people you have there. In this case, I have a particular weight, I basically rank the populations, so that the the largest place would have rank of of the smallest. However, in this case there's 50 states, so the state with the largest population would have a rank of 50 and the smallest state a rank of one. If you take that and divide that by you know the maximum rank, that's essentially your weight so it's it's a way to kind of put a weight corresponding to your total population and the idea here is that, if your incident rate is such that it overcomes this weight penalty, if you will, then that means that you might be someone worth looking into. So it tries to counteract that inflation, just due to a small population. If you are still relatively small, but your incident rate is high enough that you overcome your weight essentially, we might want to look into you. So hopefully that wasn't too much information, but that's the metric that I'll be primarily using so I'll run the script and here we go. So first I've got a straightforward line plot that kind of shows the weighted incident rates over time for all the states. Now I'll use a new feature here. We can see here that New Jersey seems to dominate. Again interactivity, we can actually click to highlight it. There's some new things that we do, especially in JMP 16. I'm going to right click here and I'm going to add some labels. So let's do the maximum value and let's do the last value just for comparison. So here we can see this...the peak here was about 11.4 incidents per 1,000 (that's a weighted incident rate) here in sort of the early '90s. And then we see a decreasing trend, again it seems to drop about the same that all the the overall incident rate did before starting to peak again here in 2019. So again with just some brief numbers again this, in and of itself, would be an interesting plot to look at, but as you could see, my little graphlet indicator is going, so there's more. Here's where the the map part comes in. So I'm going to hover over a particular point. In this case, not only can you see sort of the overall rate, I can actually break it down for you, in this case by county. So here I've colored the counties by the total number of incidents within that year. And again, there's that time aspect, so this shows you a snapshot for one particular year, in this case 2008. But maybe you're interested in the overall trend, so one one way you could do that is, hey, these are graphlets. I could go back, hover over another spot, pull up that graph, click on it to zoom, repeat as needed. You could do that or you could use this new trick I found actually while preparing this presentation. Let's unhide...notice over here to the side, we have a local data filter. That's really the key behind these graphlets. I'm going to come here to the year and I'm going to change its modeling type to nominal, rather than continuous, because now, I can do something like this. I can actually go through and select individual years or, now this is JMP, we can do better. Let me go here and I'm going to do an animation. I'm going to make it a little fast here. I'm going to click play, and now I can just sit back relax and, you know, watch as JMP does things for me. So here we can see it cycle through and getting a sense of what's happening. I'll let it cycle through a bit. We see...already starting to see some interesting things happening here. Let's let it cycle through, get the full picture, you know. We want the complete picture, not that I'm showing off or anything. Alright, so we've cycled through and we noticed something. So let's let's go down here to about say 2004, 2005. So somewhere around here, we noticed this one county here, in particular, seems to be highlighted. And in fact, you saw my little graphlet indicator. So again, I can hover over it, and here yet another map. Now you can see why I'm so excited. Again, in this case, I can actually show you at the county level, so the individual county level... Excuse me, let me...let's move that over a bit. There we are. Some minor adjustments and again, you can see my trick of emphasizing things two ways by both size and color. We can kind of see dispersion within the ???, this is individual locations and because there's that time aspect again, we know...now we know better, we don't have to go back and click and get multiple graphs, we can again use the local data filter tricks. So I can go back. I'll do the year, and so in this case, we can again click through. Here I'm just going to use the arrow keys on my keyboard to kind of cycle through. And just kind of get a sense of how things are varying over time. In this case, you see a particular area, you've probably already seen it, starting about 2006, 2007ish frame. There's this one area...this here. Keansburg, which seems to be highlighted and you'll notice yet another graphlet. How far do you want to go? Graphception, if you will. We can keep going down further and further in. In this case, I get...I break it out by what the bias was, and again I could do that trick if I wanted to, to go through and cycle through by year. So, again so much power in these graphs. With this one graphlet, I was able to explore geographical variation at county level and even further below, and so it might be allowing you to kind of explore different aspects of the data, allowing you to generate more questions. What was happening in Keansburg around this time to make it pop like this? That's something you might want to know. So that's all I have for you today, hopefully I've whet your appetite and was able to clearly illustrate for you how powerful the the JMP visualization is in exploring the data. If you want to know more, there's going to be a plenary talk on data viz. I definitely encourage you to explore that and it kind of helps address different ways of visualization and how JMP can help out with that. But I did promise you, at one point, to give you a peek as to how I was able to create these pretty amazing table scripts and I'll do that right now. It's called the enhanced log now in JMP 16. This is one of the coolest new features in JMP 16. Enhanced log actually follows along as you interact and it keeps track of it. And so whenever I closed, in this case, closed a data table, opened a data table, ran a data table, if I added a new column, if I created a new graph, it gets recorded here in the log. This is something that John Sall will be talking about in his plenary talk. It's, again, one of the most new amazing features here. And this is the key to how I was able to create these tables scripts. I can honestly say that if this hadn't been present, I probably wouldn't have been able to create these pretty cool table scripts, because it'd be a lot of work to do. So again, this is a really cool feature that's available in JMP 16. So I hope I was able to convince you that JMP is a great tool for exploring data, for creating awesome visualizations, interactive visualizations. And that's all I have. Thank you for coming.

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

Ross Metusalem, JMP Systems Engineer, SAS Text mining techniques enable extraction of quantitative information from unstructured text data. JMP Pro 16 has expanded the information it can extract from text thanks to two additions to Text Explorer’s capabilities: Sentiment Analysis and Term Selection. Sentiment Analysis quantifies the degree of positive or negative emotion expressed in a text by mapping that text’s words to a customizable dictionary of emotion-carrying terms and associated scores (e.g., “wonderful”: +90; “terrible”: -90). Resulting sentiment scores can be used for assessing people’s subjective feelings at scale, exploring subjective feelings in relation to objective concepts and enriching further analyses. Term Selection automatically finds terms most strongly predictive of a response variable by applying regularized regression capabilities from the Generalized Regression platform in JMP Pro, which is called from inside Text Explorer. Term Selection is useful for easily identifying relationships between an important outcome measure and the occurrence of specific terms in associated unstructured texts. This presentation will provide an overview of both Sentiment Analysis and Term Selection techniques, demonstrate their application to real-world data and share some best practices for using each effectively. Auto-generated transcript... Speaker Transcript Ross Metusalem Hello everyone, and thanks for taking some time to learn how JMP is expanding its text mining toolkit in JMP Pro 16 with sentiment analysis and term selection. I'm Ross Metusalem, a JMP systems engineer in the southeastern US, and I'm going to give you a sneak preview of these two new analyses, explain a little bit about how they work, and provide some best practices for using them. So both of these analysis techniques are coming in Text Explorer, which for those who aren't familiar, this is JMP's tool for analyzing what we call free or unstructured texts, so that is natural language texts. And it's a what we call a text mining tool, so that is a tool for deriving quantitative information from free text so that we can use other types of statistical or analytical tools to derive insights from the free text or maybe even use that text as inputs to other analyses. So let's take a look at these two new text mining techniques that are coming to Text Explorer in JMP Pro 16, and we'll start with sentiment analysis. Sentiment analysis at its core answers the question how emotionally positive or negative is a text. And we're going to perform a sentiment analysis on the Beige Book, which is a recurring report from the US Federal Reserve Bank. Now apologies for using a United States example at JMP Discovery Europe, but the the Beige Book does provide a nice demonstration of what sentiment analysis is all about. So this is a monthly publication, it contains national level report, as well as 12 district level reports, that summarize economic conditions in those districts, and all of these are based on qualitative information, things like interviews and surveys. And US policymakers can use the qualitative information in the Beige Book, along with quantitative information, you know, in traditional economic indicators to drive economic policy. So you might think, well, we're talking about sentiment or emotion right now. I don't know that I expect economic reports to contain much emotion, but the Beige Book reports and much language does actually contain words that can carry or convey emotion. So let's take a look at an excerpt from one report. Here's a screenshot straight out of the new sentiment analysis platform. You'll notice some words highlighted, and these are what we'll call sentiment terms, that is, terms that we would argue have some emotional content to them. For example at the top, "computer sales, on the other hand, have been severely depressed," where "severely depressed" is highlighted in purple, indicating that we consider that to convey negative emotion, which it seems to if somebody describes computer sales as "severely depressed" it sounds like they mean for us to interpret that as as certainly a bad thing. If we look down, we see in orange, a couple positive sentiment terms highlighted, like "improve" or "favorable." So we can look for words that we believe have positive or negative emotional purple for negative, orange for positive, and some sentiment analysis keeps things at that level, so just a binary distinction, a positive text or a negative text. There are additional ways of performing sentiment analysis and, in particular, many ways try to quantify the degree of positivity or negativity, not just whether something is positive or negative. So consider this other example and I'll point our attention right to the bottom here, where we can see a report of "poor sales." And I'm going to compare that with where we said "computer sales are severely depressed." So both of these are negative statements, but I think we would all agree that "severely depressed" sounds a lot more negative than just "poor" is. So we want to figure out not only is a sentiment expressed positive or negative, but how positive or negative is it, and that's what sentiment analysis in Text Explorer does. So how does it do it? Well, it uses a technique called lexical sentiment analysis that's based on some sentiment terms and associated scores. So what we're seeing right now is an excerpt from what we'd call a sentiment dictionary that contains the terms and their scores. For example, the term "fantastic" has a score of positive 90 and the term "grim" at the bottom has a score -75. So what we do is specify which terms we believe carry emotional content and the degree to which they're positive or negative on an arbitrary scale, here -100 to 100. And then we can find these terms in all of our documents and use them to score how positive or negative those documents are overall. If you think back to our example "severely depressed," that word "severely" and takes the word "depressed" and what we call intensifies it. It is an intensifier or a multiplier of the expressed sentiment, so we also have a dictionary of intensifiers and what they do to the sentiment expressed by sentiment term. For example, we say "incredibly" multiplies the sentiment by a factor of 1.4 where as "little" multiplies the sentiment by a factor of .3, so it actually, kind of, you know, attenuates the sentiment expressed a little. Now, finally there's one other type of word we want to take into account and that is negators, things like "no" and "not," and we treat these basically as polarity reversals. So "not fantastic" would be taking the score for "fantastic" and multiplying it by -1. And so, this is a common way of doing sentiment analysis, again called lexical sentiment analysis. So what we do is we take sentiment terms that we find, we multiply them by any associated intensifier or negators and then for each document, when we have all the sentiment scores for the individual terms that appear, we just average across all them to get a sentiment score for that document. And JMP returns these scores to us in a number of useful ways. So this is a screenshot out of the sentiment analysis tool and we're going to be, you know, using this in just a moment. But you can see, among other things, it gives us a distribution of sentiment scores across all of our documents. It gives us a list of all the sentiment terms and how frequently they occur. And we even have the ability, as we'll see, to export the sentiment scores to use them in graphs or analyses. And so I've actually made a couple graphs here to just try to see as an initial first pass, does the sentiment in the Beige Book reports actually align with economic events in ways that we think it should? You know, do we really have some validity to this sentiment as some kind of economic indicator? And the answer looks like, yeah, probably. Here I have a plot that I've made in Graph Builder; it's sentiment over time, so all the little gray dots are the individual reports and the blue smoother kind of shows the trend of sentiment over time with that black line at zero showing neutral sentiment, at least according to our scoring scheme. The red areas are times of economic recession as officially listed by the Federal Reserve. So you might notice sentiment seems to bottom out or there are troughs around recessions, but another thing you might notice is that actually preceding each recession, we see a drop in sentiment either months or, in some cases, looks like even a couple years, in advance. And we don't see these big drops in sentiment in situations where there wasn't a recession to follow. So maybe there's some validity to Beige Book sentiment as a leading indicator of a recession. If we look at it geographically, we see things that make sense too. This is just one example from the analysis. We're looking at sentiment in the 12 Federal Reserve districts across time from 1999 to 2000 to 2001. This was the time of what's commonly called the Dotcom Bust, so this is when there was a big bubble of tech companies and online businesses that were popping up and, eventually, many of them went under and there were some pretty severe economic consequences. '99 to 2000 sentiment is growing, in fact sentiment is growing pretty strongly, it would appear, in the San Francisco Federal Reserve district, which is where many of these companies are headquartered. And then in 2001 after the bust, the biggest drop we see all the way to negative sentiment in red here, again occurring in that San Francisco district. So, just a quick graphical check on these Beige Book sentiment scores suggests that there's some real validity to them in terms of their ability to track with, maybe predict, some economic events, though of course, the latter, we need to look into more carefully. But this is just one example of the potential use cases of sentiment analysis and there are a lot more. One of the biggest application areas where it's being used right now is in consumer research, where people might, let's say, analyze some consumer survey responses to identify what drives positive or negative opinions or reactions. But sentiment analysis can also be used in, say, product improvement where analyzing product reviews or warranty claims might help us find product features or issues that elicit strong emotional responses in our customers. Looking at, say, customer support, we could analyze call center or chats...chat transcripts to find some support practices that result in happy or unhappy customers. Maybe even public policy, we analyze open commentary to gauge the public's reaction to proposed or existing policies. These are just a few domains where sentiment analysis can be applied. It's really applicable anywhere you have text that convey some emotion and that emotion might be informative. So that's all I want to say up front. Now I'd like to spend a little bit of time walking you through how it works in JMP, so let's move on over to JMP. Here we have the Beige Book data, so down at the bottom, you can see we have a little over 5,000 reports here, and we have the date of each report from May 1972 October 2020, which of the 12 districts it's from, and then the text of the report. And you can see that these reports, they're not just quick statements of you know, the economy is doing well or poorly, they they can get into some level of detail. Now, before we dive into these data, I do just want to say thank you to somebody for the idea to analyze the Beige Book and for actually pulling down the data and getting it into JMP, in a format ready to analyze. And that thanks goes to Xan Gregg who, if you don't know, Xan is a director on the JMP development team and the creator of Graph Builder, so thanks, Xan,for your help. Alright, so let's let's quantify the degree of positive and negative emotion in all these reports. We'll use Text Explorer under the analyze menu. Here we have our launch window. I'll take our text data, put it in the text columns role. A couple things to highlight before we get going. Text Explorer supports multiple languages, but right now, sentiment analysis is available only in English, and one other thing I want to draw attention to is stemming right here. So for those who do text analysis you're probably familiar with what stemming is, but for those who aren't familiar, stemming is a process whereby we kind of collapse multiple... well, to keep it nontechnical...multiple versions of the same word together. Take "strong," "stronger," and "strongest." So these are three versions of the same word "strong" and in some text mining techniques, you'd want to actually combine all those into one term and just say, oh, they all mean "strong" because that's kind of conceptually the same thing. I'm going to leave stemming off here actually, and it's because...take "strongest," that describes as strong as something can get versus "stronger," which says that you know it's strong, but there are still, you know, room for it to be even stronger. So "strongest" should probably get a higher sentiment score than "stronger" should, and if I were to stem, I would lose the distinction between those words. Because I don't want to lose that distinction, I want to give them different sentiment scores. I'm going to keep stemming off here. So I'll click OK. And JMP now is going to tokenize our text, that is break it into all of its individual words and then count up how often each one occurs. And here we have a list of all the terms and how frequent they are. So "sales" occurs over 46,000 times and we also have our phrase list over here. So the phrases are sequences of two or more words that occur a lot, and sometimes we actually want to count those as terms in our analysis. And for sentiment analysis, you would want to go through your phrase list and, let's say, maybe add "real estate," which is two words, but it really refers to, you know, property. And I could add that. Now normally in text analysis, we'd also add what are called stop words, that's words that don't really carry meaning in the context of our analysis and we'd want to exclude. Take "district." This happen...or the Beige Book uses the word "district" frequently, just saying, you know, "this report from the Richmond district," something like that, it's not really meaningful. But I'm actually not going to bother with stop words right here and that's because, if you remember, back from our slides, we said that all of our sentiment scores are based on a dictionary, where we choose sentiment words and what score they should get. And if we just leave "district" out, it inherently gets a score of 0 and doesn't affect our sentiment score, so I don't really need to declare it as a stop word. So once we're ready, we would invoke text or, excuse me, sentiment analysis under the red triangle here. So what JMP is doing right now, because we haven't declared any sentiment terms or otherwise, it's using a built-in sentiment dictionary to score all the documents. Here we get our scores out. Now before navigating these results, we probably should customize our sentiment dictionary, so the sentiment bearing words and their scores. And that's because in different domains, maybe with different people generating the text, certain words are going to bear different levels of sentiment or bear sentiment in one case and not another. So we want to really pretty carefully and thoroughly create a sentiment dictionary that we feel accurately captured sentiment as it's conveyed by the language we're analyzing right now. So JMP, like I said, uses a built-in dictionary and it's pretty sparse. So you can see it right here, it has some emoticons, like these little smileys and whatnot, but then we have some pretty clear sentiment bearing language, like "abysmal" at -90. Now it's it's probably not the case that somebody's going to use the word "abysmal" and not mean it in a negative sense, so we feel pretty good about that. But, you know, it's not terribly long list and we may want to add some terms to it. So let's see how we do that, and one thing I can recommend is just looking at your data. You know, read some of the documents that you have and try to find some words that you think might be indicative of sentiment. We actually have down here a tool that lets us peruse our documents and directly add sentiment terms from them. So here, I have a document list. You can see Document 1 is highlighted and then Document 1 is displayed below. I could select different documents to view them. Now, if we look at Document 1, right off the bat, you might notice a couple potential sentiment terms like "pessimism" and "optimism." Now you can see these aren't highlighted. These actually aren't included in the standard sentiment dictionary. And a lot of nouns you'll find actually aren't, and that's because nouns like "pessimism" or "optimism" can be described in ways that suggests their presence or their absence, basically. So I could say, you know, "pessimism is declining" or "there's a distinct lack of pessimism," "pessimism is no longer reported." And, in those cases, we wouldn't say "pessimism" is a negative thing. It's...so you want to be careful and think about words in context and how they're actually used before adding them to a sentiment dictionary. For example, I could go back up to our term list here. I'm just going to show the filter, look for "pessimism" and then show the text to have a look at how it's used. So we can see in this first example, "the mood of our directors varies from pessimism to optimism." And the next one "private conversations a deep mood of pessimism." If you actually read through, this is the typical use, so actually in the Beige Book, they don't seem to use the word pessimism in the ways that I might fear, "optimism is increasing." So I actually feel okay about adding "pessimism" here, so let's add it to our sentiment dictionary. So if I just hover right over it, you can see we bring up some options of what to do with it. So here I can, let's say, give it a score of -60. And so now that will be added to our dictionary with that corresponding score, and it's triggering updated sentiment scoring in JMP. So that is, it's now looking for the word "pessimism" and adjusting all the document scores where it finds it. So let's go back up now to take a look at our sentiment terms in more detail. If I scroll on down, you will find "pessimism" right here with the score of -60 that I just gave it. Now I might want to actually...if you notice, "pessimistic" is, by default, has a score of -50, so maybe I just type -50 in here to make that consistent. And I could but I'm not going to, just so that we don't trigger another update. You'll also notice, right here, this list of possible sentiment terms. So these are terms that JMP has identified as maybe bearing sentiment, and you might want to take a look at them and consider them for inclusion in your dictionary. For example, the word "strong" here, if you look at some of the document texts to the right, you might say, okay, this is clearly a positive thing. And if you've looked at a lot of these examples, it really stands out that the word "strong" and correspondingly "weak" are words that these economists use a whole lot to talk about things that are good or bad about the current economy. So I could add them, or add "strong" here by clicking on, let's say, positive 60 in the buttons up there. Again, I won't right now, just for the sake of expediting our look at sentiment analysis. So we could go through, just like our texts down below, we could go through our sentiment term list here to choose some good candidates. Under the red triangle, we also can just manage the sentiment terms more directly, so that is just in the full term management lists that we might be used to for a Text Explorer user, so like the phrase management and the stop word management. You can see we've added one new term local to this particular analysis, in addition to all of our built-in terms. Of course, we could declare exceptions too, if we want to just maybe not actually include some of those. And importantly, you can import and export your sentiment dictionary as well. Another way to declare sentiment terms is to consult with subject matter experts. You know, economists would probably have a whole lot to say about the types of words they would expect to see that would clearly convey positive or negative emotion in these reports. And if we could talk to them, we would want to know what they have to say about that, and we might even be able to derive a dictionary in, say, a tab separated file with them and then just import it here. And then, of course, once we make a dictionary we feel good about, we should export it and save it so that it's easy to apply again in the future. So that's a little bit about how you would actually curate your sentiment dictionary. It would also be important to curate your intensifier terms and your negation terms, and again, you don't see scores here, because these are just polarity reversals. Just to show you a little bit more about what that actually looks like, if we...let's take a look at sentiment here, so we can see instances in which JMP has found the word "not" before some of these sentiment terms and actually applied the appropriate score. So at the top there, "not acceptable" gets a score of -25. So I show you that just to, kind of, draw your attention to the fact that these negators and intensifiers, they are kind of being applied automatically by JMP. But anyways let's let's move on from talking about how to set the analysis up to actually looking at the results. So I'm going to bring up a version of the analysis that's already done, that is, I've already curated the sentiment dictionary, and we can actually start to interpret the results that we get out. So we have our high level summary up here, so we have more positive than negative documents. As we discussed before we can see, you know, how many of each. In fact, at the bottom of that table on the left, you see that we have one document that has no sentiment expressed in it whatsoever. "strong" occurring 14,000 times, "weak" occurring 4,500 times approximately and looking at these either by their counts or their scores, looking at the most negative and positive, even looking at them in context can be pretty informative in and of itself. I mean, especially in, say, a domain like consumer research, if you want to know when people are feeling positively or expressing positivity about our brand or some products that we have, what type of language are they using, maybe that would inform marketing efforts, let's say. This list of sentiment terms can be highly informative in that regard. manufacturing, tourism, investments. And sometimes we want to zero in on one of those subdomains in particular, what we might call a feature. And if I go to this features tab in sentiment analysis, I'll click search. JMP finds some words that commonly occur with sentiment terms and asks if you want to maybe dive into the sentiment with respect to that feature. So take, for example, "sales." We can see "sales were reported weak," "weakening in sales," "sales are reported to be holding up well" and so forth. So if I just score this selected feature, now what JMP will do is provide sentiment scores only with respect to mention of "sales" inside these larger reports, and this is going to help us refine our analysis or focus it on a really well-defined subdomain. And that's particularly important. It could be the case that the domain in the language that we're analyzing isn't, you know, so well-restricted. Take, for example, product reviews. You're interested in how positive or negative people feel about the product, but they might also talk about, say, the shipping and you don't necessarily care if they weren't too happy with the shipping, mainly because it's beyond your control. You wouldn't want to just include a bunch of reviews that also comment on that shipping. And so it's important to consider the domain of analysis and restrict it appropriately and the feature finder here is one way of doing that. So you can see now that I've scored on "sales," we have a very different distribution of positive and negative documents. We have more documents that don't have a sentiment score because they simply don't talk about sales or don't use emotional language to discuss it, and we have a different list of sentiment terms now capturing sales in particular. Let me remove this. One thing I realized I forgot to mention, I mentioned it briefly, is how these overall document scores that we've been looking at are calculated, and I said that they're the average of all the sentiment terms of... that occurred in a particular document. So let's look at Document 1. I'd just like to show you that if you're ever wondering where does this score come from, let's say, -20, you can just run your mouse right over and it'll show you a list of all the sentiment terms that appeared. And you can see, here we have 16 of them, including all at the bottom, "real bright spot," which was a +78 and then, if you divide...add all those scores up. divide by 78... or divide by 16, excuse me, then you get an average sentiment of -20. And this is one of two ways to provide overall scores. Another one is a min max scoring, so differences between minimum and maximum sentiments expressed in the text. Now we can get a lot of information from looking at just this report, you know, obviously sentiment scores, the most common terms. But oftentimes we want to take the sentiments and use them in some other way, like look at them graphically, like we did back in the slides. So when it comes time for that part of your analysis, just head on up to the red triangle here and save the document scores. And these will save those scores back to your data table so that you can enter them into further analyses or graph them, whatever it is you want to do. So that's a sneak preview of sentiment analysis coming to Text Explorer in JMP Pro 16. The take-home idea is that sentiment analysis uses a sentiment dictionary that you set up to provide scores corresponding to the positive and negative emotional content of each document, and then from there, you can use that information in any way that's informative to you. So we'll leave sentiment analysis behind now and I'm going to move on back to our slides to talk about the other technique coming to Text Explorer soon. And that is term selection, where term selection answers a different question, and that is, which terms are most strongly associated with some important variable or variable that I care about? We're going to stick with the Beige Book. We're going to ask which words are statistically associated with recessions. So in the graph here, we have over time, the percent change GDP (gross domestic product) quarter by quarter, where blue shows economic growth, red shows economic contraction. And we might wonder, well, what terms, as they appear in the Beige Book, might be statistically associated with these periods of economic downturn? For example, a few of them right here. You know, why would we want to associate specific terms in the Beige Book with periods of economic downturn? Well, it could potentially be informative in and of itself to obtain a list of terms. You know, I might find some potentially, you know, subtle drivers of or effects of recessions that I might not be aware of or easily capture in quantitative data. I might also find some words for further analysis. I might...I might find some potential sentiment terms, some terms that are being used when the economy is doing particularly poorly that I missed my first time around when I was doing my sentiment analysis. Or maybe I could find some words that are so strongly associated with recessions that I think I might be able to use them in a predictive model to try to figure out when recessions might happen in the future. So there are a few different reasons why we might want to know which words are most strongly associated with recessions. So, how does this work in JMP? Well, we we basically build a regression model where the outcome variable is that variable we care about, recessions, and the inputs are all the different words. The data as entered into the model takes the form of a document term matrix, where each row corresponds to one document or one Beige Book report, and then the columns capture the words that occur in that report. Here we have the column "weak" highlighted and it says "binary," which means that it's going to contain 1s and 0s; a 1 indicates that that report contained to the word "weak" and 0 indicates that that report didn't contain the word "weak." And this is one of several ways we could kind of score the documents, but we'll we'll stick with this binary for now. So we take this information and we enter it into a regression model. So here's the what the launch would look like. We have our recession as our Y variable and that's just kind of a yes or no variable, and then we have all of these binary term predictors entered in as our model effects. And then we're going to use JMP Pro's generalized regression tool in order to build the model, and that's because generalized regression or GenReg, as we call it, includes automatic variable selection techniques. So if you're familiar with regularized regression, we're talking like the Lasso or the elastic net. And if you don't know what that means, that's totally fine. The idea is that it will automatically find which terms are most strongly associated with the outcome variable "recession," and then ones that it doesn't think are associated with it, it will zero those out. And this allows us to look at relationships between "recession" and perhaps you know hundreds, thousands of possible words that that would be associated with them. So what do we get when we run the analysis? We get a model out. So what we have here is the equation for that model. Don't worry about it too much. the idea is that we say the log odds of recession, so just it's a function of the probability that we're in a recession and when the Beige Book is issued is a function of all the different words that might occur in the Beige Book report. And you can see, we have, you know, the effect of the occurrence of the word "pandemic" with a coefficient of 1.94. That just means that the log odds of "recession" go up by 1.94 if the Beige Book report mentions the word "pandemic." Then we see minus 1.02 times "gain." Well, that means if the Beige Book report mentions the word "gain," then the probability of recession... or the log odds of recession drops by 1.02. So we get out of that are a list of terms that are strongly associated with an increase in the probability of recession, things like "pandemic," "postponement," "cancellation," "foreclosures." And we also get a list of terms that are associated with a decreasing probability of recession, so like "gain," "strengthen," "competition." We also see "manufacturing" right there, but it's got a relatively small coefficient, about -.2. And you'll actually notice here, and if we if we look at a graphical representation of all the terms that are selected in this analysis, you don't see too many specific domains like "manufacturing," "tourism," "investments" and all that. That's because those things are always talked about, whether we're in a recession or not, so what we're really looking for words that are used, you know, when we're in a recession more often than you would expect by chance. So we have...for example, those are "pandemic" being the most predictive. Makes a lot of sense. We're not talking about pandemics at all until pretty recently and then we've also experienced the recession recently, so we've picked up on that pretty clearly. Then we have a few others in this, kind of, second tier, so that's "postponed" "cancel," "foreclosed," "deteriorate," "pessimistic." And it's kind of interesting, this "postponement" and "cancellation" being associated with recessions. It makes sense, you really want to talk about postponing, say, economic activity when a recession is happening, or at least that's perhaps a reliable trend. It's...so that that's insight, in and of itself. In fact, I mean, I couldn't tell you how the Federal Reserve tracks postponing or canceling of economic activities, but the the fact that those terms, get flagged in this analysis suggests that's something probably worth tracking. Alright, so that's term selection. We actually get this nice list of terms associated with recessions out and we can see the relative strength of association. Now let's actually see that briefly here, how it's done in JMP. So I'm gonna head back on over to JMP and what we're going to do is pull up a slightly different data table. It's still Beige Book data, though, now we have just the national reports. And we have this accompanying variable whether or not the US was in a recession at the time. And of course there's some auto correlation in these data. I mean, if we're in a recession last month, it's more likely we're going to be in a recession this month than if we weren't. And you know that typically could be an issue for regression based analyses, but this is purely exploratory. We're not too too concerned with it. So I'm going to just pull Text Explorer up from a table scripts just because we've kind of seen how it's launched before. Note though that I've done some phrasing already, as we did before. I've also added some stop words, you can't see here, but words that I don't want them necessarily returned by this analysis. And I've turned on stemming, which is what those little dots in the term list mean. For example, this for "increase" now is actually collapsing across "increases," "increasing," "increasingly." And that's because now I, kind of, consider those all the same concept, and I just want to know if, you know, that concept is associated with recessions. So to invoke term selection, we'll just go to the red triangle, and I'll select it here. We get a control window popping up first, or I should say section, where we select which variable we care about, that's recessions. Select the target level, so I want this to be in terms of the probability of recession, as opposed to the probability of no recession. I can specify some model settings. If you're familiar with GenReg, you'll see that you can choose whether you want early stopping, whether you want one of two different penalizations to perform variable selection, what statistic you want to perform validation. And if that stuff is new to you, don't worry about it too much. The default settings are good way to go at first. We have our weighting, if you remember, we had the 1s and 0s in that column for "weak," just saying whether the word occurred in a document or not, but you can select what you want. So for example, frequency is not, did "weak" occur or not, it's how many times did it occur. And this kind of affects the way you would interpret your results. We're going to stick with binary for now. But I'm going to say, I want to check out the 1,000 most frequent terms, instead of the 400 by default, which you can see, that's a lot more than 436, and normally you can't fit a model with 1,000 Xs but only 436 observations, but thanks to the automatic variable selection in generalized regression, this isn't a problem. So once again it selects which of these thousand terms are most strongly related, hence the name term selection. So I'm gonna run this. You can see what has happened is JMP has called out to the generalized regression platform and returned these results, where up here, we see some information about the model as it was fit. For example, we have 37 terms that were returned. Let me just move that over. Because over here on the right is where we find some really valuable information. This is the list of terms most strongly associated with recession. Now I'll sort these by the coefficient to find those most strongly associated with the probability of recession, so once again that's "pandemic" "postponement" "cancellations" and, as you might expect, at this point if I click on one of these, it'll update these previews or these text snippets down below, so we can actually see this word in context. So this list of terms in itself could be incredibly valuable. You, you might learn some things about specific terms or concepts that are important that you might not have known before. You can also use these terms in predictive models. Now a few other things to note. You can see down here, we have once again a table by each individual document, but instead of sentiment scores, we now have basically scores from the model. We have for each one the... what we call the positive contribution, so this is the contribution of positive coefficients predicting the probability of recession. Here we have the ones on the negative end. And then we even have the probability of recession from the model, 71.8% here and then what we actually observed. And we're not building a predictive model here necessarily, that is, I'm not going to use this model to try to predict recessions. I mean, I have all kinds of economic indicator variables I would use too, but this is a good way to basically sanity check your model. Does it look like it's getting some of the its predictions right? Because if it's not, then you might not trust the terms that it returns. You also have plenty of other information to try to make that judgment. I mean, we have some fits statistics, like the area under the curve up here. Or we can even go into the generalized regression platform, with all of its normal options for assessing your model there further as well. I'm not going to get into the details there, but all of that is available to you so that you can assess your model, tweak your model how you like, to make sure you have a list of terms that you feel good about. Now you see right here, we have this, under the summary, this list of models and that's because you might actually want to run multiple models. So if I go back to the model...oh, excuse me...if we go back to our settings up here, I could actually run a slightly different model. Maybe, for example, I know that the BIC is a little more conservative than the AICc and I want to return fewer terms, maybe did an analysis that returned 900 terms and you're a little overwhelmed. So I'll click run and build another model using that instead. And now we have that model added to our list here, and I can switch between these models to see the results for each one. In this case, we've returned only 14 terms, instead of 37 and I would go down to assess them below. So two big outputs you would get from this, of course, is this term list. If you want to save that out, these are just important terms to you and you want to keep track of them, just right click and make this into a data table. Now I have a JMP table with the terms, their coefficients and other information. And if what you want to do is actually kind of write this back to your original data table, maybe, so that you can use the information in some other kind of analysis or predictive model, just head up to term selection and say that you want to save the term scores as a document term matrix, which if I bring our data table back here, you can see I've now written to their columns for each of these terms that have been selected. In this case filled in with their coefficient values, and now I can use this quantitative information however I like. That's just a bit then about term selection. Again, the big idea here is I have found a list of terms that are related to a variable I care about and I even have, through their coefficients, information about how strong that relationship might be. So let's just wrap up then. We've covered two techniques. We just saw term selection, finding those important words. Before that we reviewed sentiment analysis, all about quantifying the degree of positive or negative emotion in a text. These are two new text mining techniques coming to JMP Pro 16's Text Explorer. We're really excited for you to get your hands on them and look forward to your feedback.

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

Vince Faller, Chief Software Engineer, Predictum Wayne Levin, President, Predictum This session will be of interest to users who work with JMP Scripting Language (JSL). Software engineers at Predictum use a continuous integration/continuous delivery (CI/CD) pipeline to manage their workflow in developing analytical applications that use JSL. The CI/CD pipeline extends the use of Hamcrest to perform hundreds of automated tests concurrently on multiple levels, which factor in different types of operating systems, software versions and other interoperability requirements. In this presentation, Vince will demonstrate the key components of Predictum’s DevOps environment and how they extend Hamcrest’s automated testing capabilities for continuous improvement in developing robust, reliable and sustainable applications that use JSL: Visual Studio Code with JSL extension – a single code editor to edit and run JSL commands and scripts in addition to other programming languages. GitLab – a management hub for code repositories, project management, and automation for testing and deployment. Continuous integration/continuous delivery (CI/CD) pipeline – a workflow for managing hundreds of automated tests using Hamcrest that are conducted on multiple operating systems, software versions and other interoperability requirements. Predictum System Framework (PSF) 2.0 – our library of functions used by all client projects, including custom platforms, integration with GitLab and CI/CD pipeline, helper functions, and JSL workarounds. Auto-generated transcript... Speaker Transcript Wayne Levin Welcome to our session here on extending Hamcrest automated testing of JSL applications for continuous improvement. What we're going to show you here, our promise to you, is we're going to show you how you too can build a productive cost-effective high quality assurance, highly reliable and supportable JMP-based mission-critical integrated analytical systems. Yeah that's a lot to say but that's that's what we're doing in this in this environment. We're quite pleased with it. We're really honored to be able to share it with you. So here's the agenda we'll just follow here. A little introduction, my self, I'll do that in a moment, and just a little bit about Predictum, because you may not know too much about us, our background, background of our JSL development, infrastructure, a little bit of history involved with that. And then the results of the changes that we've been putting in place that we're here to share with you. Then we're going to do a demonstration and talk about what's next, what we have planned for going forward, and then we'll open it up, finally, for any questions that that you may have. So I'm Wayne Levin, so that's me over here on the right. I'm the president of Predictum and I'm joined with Vince Faller. Vince is our chief software engineer who's been leading this very important initiative. So just a little bit about us, right there. We're a JMP partner. We launched in 1992, so 29 years old. We do training in statistical methods and so on, using JMP, consulting in those areas and we spend an awful lot of time building and deploying integrated analytical applications and systems, hence why this effort was very important to us. We first delivered JMP application with JMP 4.0 in the year 2000, yeah, indeed over 20 years ago, and we've been building larger systems. Of course, since back then, it was too small little tools, but we started, I think, around JMP 8 or 9 building larger systems. So we've got quite a bit of history on this, over 10 years easily. So just a little bit of background...until about the second half of 2019, our development environment was really disparate, it was piecemeal. Project management was there, but again, everything was kind of broken up. We had different applications for version control and for managing time, you know, our developer time, and so on, and just project management generally. Developers were easily spending, and we'll talk about this, about half their time just doing routine mechanical things, like encrypting and packaging JMP add-ins. You know, maintaining configuration packages and, you know, and separating the repositories or what we generally call repo's, you know, for encrypted and unencrypted script. It was...there was a lot we hade to think about that wasn't really development work. It was really work that developer talent is...was wasted on. We also had, like I said, we've been doing it a long time, even at 2019, we had easily 10 years, so over 10 years of legacy framework going all the way back even to JMP 5, you know, with, you know, it was getting bloated and slow. And we know JMP has come a long way over the years. I mean in JMP 9, we got namespaces and JMP 14 introduced classes and that's when Hamcrest began. And it was Hamcrest that really allowed us to go this this...with this major initiative. So we began this major initiative back in August of 2019. And that's when we are acquired our first Gitlab licenses and that's the development of our new...the development of our new development architecture, there you go, started to take shape and it's been improving ever since. Every month, basically, we've been adding and building on our capabilities to become more and more productive, as we go forward. And and that's continuing, so we actually consider this, if you will, a Lean type of effort. It really does follow Lean principles and it's accelerated our development. We have automated testing, thanks to this system, and Vince is going to show us that. And we have this little model here, test early and test often And that's what we do. It supports reusing code and we've redeveloped our Predictum system framework. It's now 2.0. We've learned a lot from our earlier effort. All that's gone, pretty much all of its gone, and it's been replaced and expanded. And Vince will tell us more about that. Easily, easily we have over 50% increase in productivity, and I'm just going to say the developers are much happier. They're less frustrated. They're more focused on their work, I mean the real work that developers should be doing, not the tedious sort of stuff. There's still room for improvement, I'm going to say, so we're not done and Vince will tell us more about that. We have development standards now, so we have style guides for functions and all of our development is functionally based, you might say. Each function requires at least one Hamcrest test, and there are code reviews that the developers, they're sharing with one another to ensure that we're following our standards. And it raises questions about how to enhance those standards, make them better. We also have these, sort of, fun sessions, where developers are encouraged to break code, right, so they're called like, these break code challenges, or what have you. So it's become part of our modus operandi and it all fits right in with this development environment. It leads to, for example, further tests, further Hamcrest tests to be added. We have one small, fairly small project that we did just over a year ago. We're going into a new phase of it. It's got well over... well over 100 Hamcrest tests are built into it and they get run over and over and over again through the development process. So some other benefits is it allows us to assign and track our resource allocation, like what developers are doing what. Everyone knows what everyone else is doing, continuous integration, continuous deployment, something like that), there's...code collisions are detected early so if we have... and we do, we have multiple people working on some projects, so, you know, somebody's changing a function over here and it's going to collide with something that someone else is doing. We're going to find out much sooner. It also allows us to improve supportability across multiple staff. We can't have code dependent on a particular developer; we have to have code that any developer or support staff can support ging forward. So that's was an important objective of ours as well. And it does advance the whole quality assurance area just generally, including supporting, you know, FDA requirements, concerning QA, you know, things like validation, the IQ OQ PQ. So it's...we're automating or semi automating those tasks as well through this infrastructure. We do use it internally and externally, so you may know, we have some products out there, (???)Kobe sash lab but new ones spam well Kobe send spam(???) are talked about also elsewhere in the JMP Discovery European Conference in 2021. You might want to go check them out, but they're fairly large code bases and they're all developed, in other words, we eat our own dog food, if you know that expression, but we also use it with all of our client development, so this is something that's important to our clients, so because we're building applications that they're going to be dependent on. And so we, we need to...we need to have the infrastructure that allows us to be dependable, and anyway, that's a big part of this. I mentioned the Predictum system framework. You can see some snippets of it here. It's right within the scripting index, and you know, we see the arguments and the examples and all that. We built all that in and 95%, over 95% of them have Hamcrest tests associated with them. Of course, our goal is to make sure that all of them do and we're we're getting there. We're getting there. Have...these framework...this framework is actually part of our infrastructure here. That's one of the important elements of it. Another is just that...Hamcrest... the ability to do the unit testing. And I'm going to have...there's a slide at the...at the end, which will give you a link into the Community where you can learn more about Hamcrest. This is a development that was brought to us by by JMP, back in JMP 14, as I mentioned a few minutes ago. Gitlab is a big part of this; that gives us the project management repository, the CI/CD pipeline, etc. And also there's a visual...visual studio code extension for JSL that we created and we'd...you see five stars there because it was given five stars on the on the visual studio. I'm not sure what we call that. Vince, maybe you can tell us, the store, what have you. It's been downloaded hundreds of times and we've been updating it regularly. So that's something you can go and look for as well. I think we have a link for that as well in the resource slide at the end. So what I'm going to do now is I'm going to pass this over to Vince Faller. Vince is, again, our chief software engineer. Vince led this initiative, starting in August 2019, as I said. It was a lot of hard work and the hard work continues. We're all, in the company, very grateful for Vince and his leadership here. So with that said, Vince, why don't you take it from here? I'm gonna... I'm... Vince Faller Sharing. So Wayne said Hamcrest a bunch of times. For people that don't know what Hamcrest is, it is an add-in created by JMP. Justin Chilton and Evan McCorkle were leading it. It's just a unit testing library that lets you run, test, and get results of it in an automated way. It really started the ball rolling of us being able to even do this, hence why it's called extending. I'm going to be showing some stuff with my screen. I work pretty much exclusively in the VSCode extension that we built. This is VSCode. We do this because it has a lot of built-in functionality or extendable functionality that we don't have to write, like get integration, get lab integration. Here you can see this is a JSL script and it reads it just fine. If you want to get it, if you're...if you're familiar with VSCode, it's just a lightweight text editor. You just type in JMP and you'll see it. It's the only one. But we'll go to what we're doing. So. For any code change we make, there is a pipeline run. We'll just kind of show what it does. So if I change the README file to this is a demo for Discovery. 2021. And I'm just going to commit that. If you don't know get, committing as just saying I want a snapshot of exactly where we are at the moment, and then you push it to the repo and it saved on the server. Happy day. Commit message. More readme info. And I can just do get push, because VSCode is awesome. Pipeline demo. So now I've pushed it. There is going to be a pipeline running. I can just go down here and click this and it will give me my merge request. So now pipeline started running. I can check the status of the pipeline. What it's doing right now is it's going to go through and check that it has the required Hamcrest files. We have some requirements that we enforce so that it can... we can make sure that we're doing our jobs well. And then it's done. I'm going to press encrypt. Now encrypt is going to take the whole package and encrypt it. If we go over here, this is just a vm somewhere. It should start running in a second. So it's just going through all the code. Writing all the encrypted passwords, going through, clicking all that stuff. If you've ever tried to encrypt multiple scripts at the same time, you'll probably know that that's a pain, so we automated it so that we don't have to do this because, as Wayne said, it was taking a lot of our time to do these. Like, if we have 100 scripts to go through and encrypt every single one of them every time we want to do any release, it was awful. Because we have to have our code encrypted because, yeah sorry, opinion, all right, I can stop sharing that. Ah. So that's gonna run. It should finish pretty soon. Then it will go through and stage it and then the staging basically takes all of the sources of information we want, as our as in our documentation, as in anything else we've written, and it renders them into the form that we want in the add-in, because much like the rest of github, gitlab, most of our documentation is written in markdown and then we render it into whatever. I don't need to show the rest of this but yeah. So it's passing. It's going to go. We'll go back to VSCode. So. If we were to change, so this is just a single function. If I go in here like, if I were to run this... JSL, run current selection. So. You can see that it came back...all that it's trying to do is open Big Class, run a fit line, and get the equation. It's returning the equation. And you can actually see it ran over here as well. But. So this could use some more documentation. And we're like, oh, we don't actually want this data table open. But let's let's just run this real quick. And say, no. This isn't a good return, it turns the equation in all caps apparently. So if I stage that. Better documentation. Push. Again back to here. So, again it's pushing. This is another pipeline. It's just running a bunch of power shell scripts in order, depending on however we set it up. But you'll notice this pipeline has more stages. So when we in an effort to help be able to scale this, we only test the JSL minimally at first, and then, as it passes, we allow to test further. And we only tested if there are JSL files that have changed. But we can go through this. It will run and it will tell us where it is in the the testing, just in case the testing freezes. You know, if you have a modal dialog box that just won't close, obviously JMP isn't going to keep doing anything after that. But you can see, it did a bunch of stuff, yeah, awesome. I'm done. Exciting. Refresh that. Get a little green checkmark. And we could go, okay, run everything now. It would go through, test everything, then encrypt it, then test the encrypted, basically the actual thing that we're going to make the add-in of, and then stage it again, package it for us, create the actual add-in that we would give to a customer. I'm not going to do that real quick because it takes a minute. But let's say we go in here and we're, like, oh, well, I really want to close this data table. I don't know why I commented out in the first place. I don't think it should be open, because I'm not using it anymore, we don't...we don't want that. We'll say okay. Close the dt. Again push. Now, this could all be done manually on my computer with Hamcrest. But you know, sometimes a developer will push stuff and not run all of their Hamcrest for everything on their computer, and this is...the entire purpose of it is to catch it. It forced us to do our jobs a little better. And yeah. Keep clicking this button. I could just open that, but it's fine. So now you'll see it's running the pipeline again. Go to the pipeline. And I'm just going to keep saying this for repetition. We're just going through, testing, and encrypting, then testing because sometimes encryption enters its own world of problems, if anybody's ever done encrypting. Run, run, run, run, run. And then, oh, we got a throw. Would you look at that? I'm not trying to be deadpan, but you know. So if we were to mark this as ready and say, yeah we're done, we'd see, oh, well, that test didn't pass. Now we could download why the test didn't pass in the artifacts. And this will open a J unit file that I'm just going to pull out here. It will also render it in getlab, which might be easier, but for now we'll just do this. Eventually. Minimize everything. Now come on. So, we could see that something happened with R squared and it failed. Inside of blue. So we can come here. Say, why is there something in boo that is causing this to fail? We see, oh, somebody called our equation and then they just assumed that the data table was there. So because something I changed broke somebody else's code, as if that would ever happen. So we're having that problem. Where did you go? Here we go. So that's the main purpose of everything we're doing here, is to be able to catch the fact that I changed something and I broke somebody else's stuff. So I could go through. Look at what boo does. Say, oh well, maybe I should just open Big Class myself. Yeah, cool. Well, if I save that, I should probably make it better. Open Big Class myself. I'll stage that. Open Big Class.Get push. And again, just show the old pipeline. Now this should take not...not too long. So we're going to go in here. We're...we only test on one...to... JMP version, but you can see automatically, we only test on one. Then it waits for the developer to say, yeah, I'm done and everything looks good. Continue. We do that for resource reasons, because these are running on vms that are automatically just chugging all the time, and we have multiple developers, who are all using these systems. We're also... You can see, this one is actually a docker system, we're containerizing these. Well, we're in the process of containerizing these. We have them working, but we don't have all the versions yet. But we run 14.3, at least for this project, we run 14.3, 15, 15.1, and that should work. Let's just revert things. Because that you know works. Probably should have done a classic...but it's fine. So yeah. We're going to test. I feel like I keep saying this over and over. We're going to test everything. We'll actually let this one run to show you kind of the end result of what we get. It should only take. a little bit. And so we'll test this, make sure it's going, and you can see the logs. We're getting decent information out of what is happening, on where it is, like it'll tell you the runner that is running. I'm only running on Windows right now. Again, this is a demo and all that but we should be able to run more. While that's running, I'll just talk about VSCode some more. In VSCode, we also...there's also snippets and things, so if you want to make a function, it will create you all over the the function information. We use natural docs again, that was stolen from the Hamcrest team, as our development documentation. So it'll just put everything in a natural docs form. So it just, again, the idea is helping us do our jobs and forcing us to do our jobs a little better, with a little more gusto. Wayne Levin For the documentation? Vince Faller So that's for the documentation, yeah. Wayne Levin As we're developing, we're documenting at the same time. Vince Faller Yep. Absolutely. You know, it also has for loops, while loops. For with an associate row, stuff like that. Are we...is this...is this done yet? It's probably done, yep. So we get our Green checkmark. Now it's going to run on all of the systems. If we can go back to here, you'll just see it. Open JMP. It'll run some tests, probably will open Big Class. Then close all...close itself all down. Wayne Levin So we're doing this, largely because many of our clients have different versions of JMP deployed and they want a particular add-in but they're running it, they have, you know, just different versions out there in the field. We also test against the early adopter versions of JMP, which is a service to JMP because we report bugs. But also for the clients, it's helpful because then they know that they can...they can upgrade to the new version of JMP. They know that the applications that we built for them have been tested. And that's just routine for us. Good. Vince Faller You're done. You're done. You're done. Change to... I can talk about... And this is just going to run, we can movie magic this if you want to, Meg. Just to make it run faster. Basically, I just want to get to staging but it takes a second. Is there anything else you have to say, Wayne, about it? Cool. I'll put that... Something I can say, when we're staging, we also have our documentation in mk docs. So it'll actually run the mk doc's version, render it, put the help into the help files, and basically be able to create a release for us, so that we don't have to deal with it. Because creating releases is just a lot of effort. Encrypting. It's almost done. Probably should just have had one pre loaded. Live demos, what are you gonna do. Some. Run. Oh, one thing I definitely want to do. So, the last thing that the pipeline actually does is checks that we actually spent our time, because, you know, if we don't actually record our time spent, we don't get paid, so forcing us to do it. Great, great time. Vinde Faller So Vince Faller the job would have failed without that. I can just show some jobs. Trying. That's the docker one. We don't want that. So You can see that gave us our successes. No failures. No unexpected throws. That's all stuff from Hamcrest. Come on. One more. Okay got to staging. One thing that it does it creates the repositories. It creates them fresh every time. So it's like, it tries to keep it in a sort of stateless way. Okay, we can download the artifacts now. And now we should have this pipeline demo. I really wish it would have just went there. What. Why is Internet Explorer up? So now you'll see pipeline demo is a JMP add-in. If we unzip it. If you didn't know, a JMP add-in is just a zip file. If we look at that now, you can see that has all of our scripts in it, it has our foo, it has our bar. If we open those, open, you can see it's an encrypted file. So this is basically what we would be able to give to the customer and not have so much mechanical work. Wayne mentioned that it's less frustrated developers, and personally, I think that's an understatement, because doing this over and over was very frustrating before we got this in place, and this has helped a bunch. That. Wayne Levin Now, about the encryption, when you're delivering an add-in for use by users within a company, you typically don't want, for security reasons and so on, you don't want them to anyone to be able to go in and deal with the code. You know, that sort of thing, so we may deliver a code unencrypted just for, you know, so the client has their code on encrypted, but for delivery to the end user, you typically want everything encrypted, just so it can't be tampered with. Just one of those sort of things. Vince Faller Yep, and that is the end of my demo. Wayne, if you want to take it back for the wrap up. Wayne Levin Yeah, terrific. Sure, thanks, very much for that, Vince. So there's a lot of moving parts in this whole system so it's, you know, basically, making sure that we've got, you know, code being developed by multiple users that are not colliding with one another. We're building in the documentation at the same time. And actually, the documentation gets deployed with the application and we don't have to weave that in. It's... We set the infrastructure up so that it's automatically taken care of. We can update that, along with the code comprehensively, simultaneously, if you will. The Hamcrest tests that are going on, each one of those functions that are written, there are expected results, if you will. So they get compared and so we saw, briefly, there was, I guess, some problem with that equation there. An R square or whatever came back with a different value, so it broke, in other words, to say hey, something's not right here; I was expecting this output from the function for a use case. So that's one of the things that we get from clients, so you know, we build up a pool of use cases that get turned into Hamcrest tests and away we go. There are some other slides here that are available to you, like, when you, when you, if you go and download the slides. So I'll leave that available for you and here's a little picture of of the pipeline that that we're employing and a little bit about code review activity for developers too. If you want to to go back and forth with it. Vince, do you want to add anything here about how code review and approval takes place? Vince Faller Yeah, so inside of the merge request it will have the JSL code on the diffs of the code. And again, a big thank you to the people who did Hamcrest, as well, because they also started a lexer(?) for GitHub and GitLab to be able to read JSL, so actually this is inside of getlab. And they can also read the JSL. It doesn't execute it, but it has nice formatting. It's not all just white text, it's it's beautiful there. We will just go in, like in this screenshot, you click a line, you put in a comment that you want, and it becomes a reviewable task. So we try to do as much inside of GitLab as we can for transparency reasons, and once everything is closed out, you can say yep, my merge request is ready to go. Let's put it into the master branch, main branch. Wayne Levin Awesom. So you're really it's...it's helping, you know, we're really defining coding standards, if you will, and I don't like the word enforcement but that's what it what it amounts to. And it reduces variation. It makes it easier for multiple developers, if you will, to understand what what others have done. And as we bring new developers on board, they come to understand the standard and they know what to look for, they know what to do. So it...it makes onboarding a lot easier, and again it deals with...everything's attached to everything here, so you know supportability and so on. This is the slide I mentioned earlier, just for some resources so we're using GitLab. I suppose the same principles applied to any git generally so like GitHub or what have you. Here's the community link for Hamcrest. There was a talk at in tucson, that was in 2019, in the old days when we used to travel and get together. That was a lot of fun. And here's the the marketplace link for the visuals...visual code studio. Visual studio code, what have you. So as Vince said, yeah we make a lot of use of that editor, as opposed to using the built-in JMP editor just because it's all integrated. It's... it's just all part of one big application development environment. And with that, Vince and I, on behalf of Vince and myself, I want to thank you for your your interest in this, and want to, again we really want to thank the JMP team. Justin Chilton and company, I'll call out to you. If not for Hamcrest, we would not be on this. That was the missing piece or was the enabling piece that really allowed us to to take JSL development to, basically, the kinds of standards you expect in code development, generally, in industry. So we're really grateful for it, and I know our... you know, that that is propagated out with each application we've deployed. And at this point, Vince and I are happy to take any questions that... info@predictum.com and it'll get forwarded to us and we'll get back to you. But at this point, we'll open it up to Q&A.

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

0 attendees

0

Event has ended

0 attendees

0

Sunday, March 7, 2021

Stuart Little, Lead Research Scientist, Croda This presentation will show how some of the tools available in JMP have been successfully used to visualize and model historic data within an energy technology application. The outputs from the resulting model were then used to inform the generation of a DOE-led synthesis plan. The result of this plan was a series of new materials that have all performed in line with the expectations of the model. Through this approach, a functional model of product performance has been successfully developed. This model, alongside the visualization capabilities of JMP, has allowed for the business to begin to embrace a more structured approach to experimentation. Auto-generated transcript... Speaker Transcript Stuart Little Hi everyone, and welcome to this talk around how JMP is being used at Croda to help drive new product development. So what we're going to cover today, and firstly we're going to cover some context to Croda, how we are using JMP and where we are on that journey, and a summary of the problem we're trying to solve. Once you've been covered the problem, we then are going to move to JMP, look at how the tools and platforms in JMP have allowed easy data exploration and easy development of a structure performance model. And then finally we'll wrap this up by discussing the outcomes of this work and how by doing this kind of research, we've been able to increase buy in to the use of data and DOE techniques in and...in the research side of the business. So firstly, who we are as Croda. It's a question that does come up quite a lot, because we're a business-to-business entity. But as a business, Croda are the name behind a lot of high-performance ingredients and technologies, and behind a lot of the biggest and most successful brands across the world, across a range of markets. We create, make, and sell speciality chemicals. From the beginning of Croda, these have been predominantly sustainable materials. So we started by making lanolin, which is from sheep's wool, and we continually build on that sustainability. Last year we made a public pledge to be climate, land, and people positive by 2030 and have signed up to the UN Sustainable Development Goals as part of our push to achieve this and become the most sustainable supplier of innovative ingredients across the world. So, in terms of the markets we serve, we have a kind of very big personal care business where we deal with skincare and sun care and sort of hair care, color cosmetics, and those kind of traditional personal care products. Life sciences business, our products...our products and expertise help customers optimize their formulations and their active ingredient use. I mean, most recently, we in an agreement with Pfizer to provide materials that are going into their COVID-19 vaccine. Our industrial chemicals business, that's...that part of the business is responsible for supplying technically differentiated, predominantly sustainable materials to a huge range of markets. A lot of markets aren't quite...don't quite fit into anything else on this slide. And then finally we've got our performance technologies business. This covers a lot, again, a lot of similar areas, providing high performance answers across across all of these. And then today, in particular, we're talking about our energy technologies business, and specifically, kind of, battery technology in high performance electronics. So where we are in Croda and JMP is we've been using JMP for about two years and we've had a lot of interest internally. It's been harder to build confidence that these techniques have real value to research. And so to prove this, we've gone away, we've created a number of case studies that have been pretty successful on the whole. We've demonstrated the potential and some of the pitfalls within that. And then all of that has then led to a slightly bigger set of projects, one of which is the one we're going to talk to about...talk to you today. how do we improve the efficiency of electrical cooling systems? The primary driver for this project is sort of transport electrification, so that's battery vehicles. How do you maintain the battery properly? How do you make, make sure the motors are working at their optimum level? And how'd you do that without electrocuting anyone? So currently there's a set of cooling methods for these things, that our customers are certainly looking at how that can be improved, because the better the control of your battery cooling, for instance, the better battery capacity you have, and the more consistent the range will be. And because, you know, this is critical and there's lots of different applications that are broadly similar, the really useful thing for us would be to build an understanding of these fluids, by having some sort of data led model and that's where JMP came in. So how can we do that? Well, the first thing we we looked at, was what are the current cooling methods? So for batteries, predominantly they're air cooled or coldplaee cooled in the previous generation. And the electronics in the car, you have the opposite problem of the battery but thing at tend to get too hot so, then we have heatsinks, to try and take that energy away. And in electric motors, we're trying to minimize the resistance in there, they tend to be jacketed with fluid. In all three of these cases, the incoming alternative method of cooling relies on fluid, so that's direct immersion for batteries and electronics, and then in terms of the electric motors, that tends to be more of a flow. So what does that fluid look like? Obviously, we're dealing with high voltages so we have to have something that's not electrically conductive. It also needs to have a really high conductivity of heat, so that it can pull heat out of the electronics. And because these fluids need to be moved around the system, the viscosity has to be low. So we have kind of practical physical constraints that have been introduced by the application itself. If you look at it in a bit more depth, the ability of the fluids to transfer heat is based predominantly on this equation. And what this tells us is there is a part that we can control by the fluid, which is the heat transfer coefficient, and then there is a part that is controlled by the engineering solution in the application to that. What's the area for cooling, and what are the temperatures of the surfaces that you're trying to cool? But for in all cases, to get an efficient heat transfer, we have to have a high heat transfer coefficient, and as that's the thing we can effect, that's where we looked. That heat transfer coefficient is defined by this equation in a simplistic way, there are other terms in there, but predominantly, it's a function of density, thermal conductivity, the heat capacity, and then having an effect of the viscosity of the system. So, if we look specifically for the applications were interested in, if we want to optimize our dielectric fluid, we need to increase the density, increase the thermal conductivity, and increase the heat capacity, but alongside that, we need to reduce the viscosity of the...of the fluid. And these match up pretty well with the engineering challenges that we have, which is helpful. So from that, what we really wanted to do was, we knew what the target was. And we really wanted to understand what the relationship was between structure and product performance as a dielectric fluid. So initially we proceeded in kind of a fairly traditional way and we started conducting a large-scale study measuring the physical properties of a lot of esters and a lot of other materials. And then, when when we saw that and looked at that, we thought, well actually, this data exists so why don't we use these data sets to try and build some models, and say, can we really understand that physical property to structure to performance relationship. So that's where...we're just going to pop into JMP so just bear with me one second. Okay. So the first thing that we we did was we collated that mix of historic data, data that was being obtained through targeted testing by the applications teams. And once we've got that into one place, we kind of examined that in JMP to say is there...to understand, is there a relationship, but at a really simple level between the physical properties we're measuring. So, if we look at that that data set, the first port of call for me, as ever, is the distribution platform in JMP. And it's a really easy way just to see if something that you want has any kind of vague pattern anywhere else. So if I, in this case, if you say, oh, we want everything that's got a high thermal conductivity, what we see is the properties that are pretty stretched out across the other...the other properties we've measured. So it doesn't really say, oh, there's brilliant relationship, what you need is this, which is kind of what we expected. But it's nice to have a check. Similarly, if we then plot everything as scatterplots, what we see is a lot of noise. I mean, these lines of fit are just there for reference to show there isn't really any fit. In no way am I claiming your correlation on these. And all of that was disappointing, that there isn't an obvious answer was expected. Where it got interesting to us is, we said, well, we were expecting that there isn't a clear... a clear relationship between any of these factors, because if there was, it would have been obvious to the experienced scientists doing the work, and we would have known that. So, then, we said, well, what we do know is these properties all have an understanding...a relation to their structure. What happens if we calculate some physical parameters for these things and combine that with a number of, sort of, structural identifiers and ways of looking at these molecules? What happens if we take that and add that data to the test data? Do...you know, can we then build some kind of model? This starts being able to estimate structure and performance, so that's exactly what we did. You know, in this case, what we see is that, again, if we use the multivariate platform, just as a quick look to see if there's any correlation on some of these these factors, this clear differentiation in some cases, between up and down, and maybe a little hints of correlation, but nothing clear that says, this is the one thing that you need. Again, this is what we expected. So then, what we did was we used the regression platforms in JMP to try and understand whether we could build a model, and what that relationship looks like. So to do this, we randomly selected a number of rows for the row selection tool in JMP. Generally, pulling out five samples at random, which weren't going to participate in the model, then it's a relatively built up these models and refined them that way, so we always had a validation set from the initial data, just to...just to check that what we were doing had any chance of success. So then, if we just look at the 80 degree models, the first model that we we came to was this one. Clearly, as we can see, there are a number of factors that were included in this model that make no sense from a statistical point of view, because they're just overfitting and they are just non significant. However, these are fairly important in terms of describing the molecules that are in there, so as a chemist, we created this model. So this is a model that allows, you know ,molecules, if you like, to be designed for this application, even though we know it's over fitted. And we know that it's not...it's not really a valid model because these terms are just driving the R squared up and up and up. We also built the model without those terms. This is a far better model in terms of estimating the performance of these things, the R squared is a touch lower, but all the terms that are in there have a significant impact on the performance. The downside of this model is it doesn't really help us design any new chemistry. But, in both cases, when we look at the predicted values against the actual measured values, we see a reasonable correlation between them. Certainly when we expect things to be high, they are. So that gave us some confidence that this model might actually perform for us. In terms and...then in terms of when we looked at how good this might be, we just simply looked at what's the percentage difference between the measured value and the actual value. And what we see is they are almost universally within 10%, predominantly within 5% for either model. Again across a range of different types of material, this gave us confidence that what we were...what we were seeing might be a real effect. All of which is very nice, is this just an effect of the data we've measured? So what we did was that was we used the profiler platform in JMP, produced a shareable model that we could send around the project team, and essentially set up a competition and said look, whoever can find the highest... the highest thermal conductivity in this model from a molecule that could actually be made, wins. You know, from that we had a list of about 14 materials back that looked promising. We had to cut a few out because they were impossible to source of raw materials, so we ended up with about nine new materials that were synthesized and tested. Now these materials were almost exclusively made up of materials the model hadn't really seen before, you know. In some cases, part of the molecule would be the same, but they were quite distinct from the original materials. So once we'd made them and, once they had been tested, we put them back into the model and to see, just to see what the predictive power of this model was like. So if we have a look at that data, you know, I think, given the differences of these materials, I was fully expecting this... this to break the model. However, when we... if we look at the predictions again, what we see is the highlighted blue ones are the new materials that were made. You know, we deliberately picked a couple that were lower just out of curiosity, just to check. And all the ones that we picked that we thought would be high were high. So in the overfitted model that had value from a designing a structure point of view, what we see is one outlier. In the...in the model that was statistically reasonable, we actually see a much better fit overall. And that was edifying, that we can start to be able to not, sort of, design a single molecule and say, oh, here you go, off you...off you pop; here's the one thing you need to make. But certainly be able to direct synthetic chemists to the right, sort of, types of materials to really drive projects forward. So then, if we just look again at these residuals, what we see is for the you know, for the statistically good model with no overfitting, what we saw was everything was within 10% of all the new materials, which, for what we were trying to achieve, was good enough. There's a few in the overfitting model that were a little bit under 10% but, again, this is kind of what I would expect to see. And it's, you know, it was it was nice that they were all in the right range, because it shows that this approach was was having value, but it was also quite to find that they weren't all at exactly right, because I tlhink, had we produced nine materials and they'd all been within 1%, I'm not sure that people would have believed that either. So the fact is, we were getting a similar... a similar level of difference to the predictions from the materials we started with and the new materials that we made. So we started having some real confidence in this in this model. And then, if we just go back to the slides a second. So what we can say then is that the structure performance relationship of these materials has been created in JMP using the regression platforms. We've used the visualization tools in JMP to be able to see that there's real benefits to do this, and that the model itself is being used to direct this emphasis of new materials in this project. It's being used to screen likely materials to test from things we already make. And it's a, you know, there's an acceptable correlation in the results between the model and the new molecules we're making, all of which has given real confidence to this approach, and it's really allowed us to, kind of, push this project further and sort of split it out into specific target materials. So, in terms of new molecules, we've directed this emphasis of molecules with higher thermal conductivity. So as you can see in this plot, you know, all the new molecules are sort of medium to high on that range of thermal conductivity, which is kind of what we wanted to achieve from them. We demonstrated that we could target an improvement, using data and then verify that in the lab and make it. Where this project then becomes harder still is, we're now trying to build similar models for all the other factors that influence the performance of these dielectric fluids, and then we will be trying to balance those models against each other to find the best outcomes. So all of that further development is ongoing, but that momentum has come purely by the ease of use of JMP and the platforms in it to take a data set and with a bit of kind of domain knowledge, really push that forward and say, yep here's a model that will help direct this emphasis for this project and subsequent projects in this area for Croda. So then, just in conclusion, data that we've obtained from testing has been used to successfully model the performance of these these materials. It's not absolutely perfect, but it's good enough for what we want. The model... the model demonstrates that there is a structure performance a relationship of esters (sorry, not sure why my taskbar is jumping around). The model has been used to predict materials of high thermal conductivity. Those predictions are then verified initially by just exclusion and then laterally by making new materials, and really showing that this this model holds for that type of chemistry. It's also demonstrated the possibility of tailoring properties of, in this case, dielectrics but other materials, if you build similar models, so that you can start being able to create specific materials for specific applications. And I think most importantly for me, the real success of this work has built internal momentum to sort of demonstrate that JMP is not a nice to have, it's a...it's a real platform to develop research, to very quickly look at data sets and say, is there something there? And with that, I just like to say thank you for for watching. Obviously I can't answer any questions on a recording, but if you want to get in touch, feel free to comment in the Community. Yeah, thank you very much.

0 attendees

0

Event has ended