Abstracts

0 attendees

0

Wednesday, December 15, 2021

レベル：初級滋賀大学は、日本初のデータサイエンス（以下「DS」とする）学部及びDS研究科を開設し、データから価値を創造できる人材を、リテラシーレベルからエキスパートレベルまで養成できる体制を整えている。卒業生、修了生はDS・AI活用課題を抱える企業から高い評価を得ている。2016年以来、数理・DS教育強化拠点コンソーシアムの一つとして、スキルセット開発や教材提供、FD研修など様々な拠点校連携事業を推進している。 DS学部において、学生が専用ソフトウェアとデータを使って統計手法を初めて実践する場になるのは、2回生の前期に演習形式で行われる授業である。学生は、1回生時に座学で学んだ手法を実践し、独自のテーマを設定して発表を行う。学ぶソフトウェアの1つがJMPで、学生個人のPCにインストールしても使えるため、自宅での作業も可能で、オンライン併用の授業形態にも適している。本発表では、1. 滋賀大DS学部・大学院カリキュラム、2. DS学部カリキュラムにおけるJMPを用いる授業の位置付け、3. その演習内容、そして4. 数理・DS教育強化拠点コンソーシアムとの関連について紹介する。

0 attendees

0

Event has ended

0 attendees

0

Wednesday, December 15, 2021

レベル：初級 JMPは操作がしやすく、かつ多様な解析方法をサポートしているため、当大学では医学研究者に広く利用されている。しかしながら、昨今は比較的高度な解析方法について論文の査読者から求められる、あるいは研究者自身がそれを実践したいという場面が増えてきており、JMPの既存の機能だけでは要望に十分応えられないことがある。例えばSASのようなソフトウェアを用いれば高度な解析も可能であるが、多忙な医学研究者が自身でSAS等のプログラミングを思うままに行うにはハードルが高い。本発表では、統計コンサルテーションで寄せられた相談事例をもとに、今後JMPに実装されると有益と考える解析方法のいくつかを報告する。

0 attendees

0

Event has ended

0 attendees

0

Wednesday, December 15, 2021

レベル：初級治療効果の推定のために群間比較を行う観察研究や疫学研究では，しばしば被験者の背景因子の不均衡が生じる．背景因子のうち交絡となりうる因子の群間での不均衡は比較可能性を損なわせる原因となることから，解析に十分な注意を要する．一般に交絡因子の不均衡は層別解析によりある程度調整が可能であるが，サンプルサイズに対して多くの交絡因子を考える場合には層別解析を適用できないことがある．このような場合に利用される調整手法の一つに傾向スコアを用いた解析がある．傾向スコアは主に層別分析，マッチング，逆確率重み付け（inverse probability weighting: IPW）に用いられ，マッチングはJMPアドインを用いた実装が可能である．しかし，傾向スコアによる層別解析やIPWはJMPには実装されておらず，手動でいくつかの手順を踏む必要がある．傾向スコアを用いた解析においては複数の傾向スコアモデルを検討することからも，手順を自動化することは研究の効率化に重要である．本発表では特に傾向スコアによるIPWに着目し，JMPでの実行方法と，手順を自動化する方法をデモンストレーションも交え紹介する．

0 attendees

0

Event has ended

0 attendees

0

Wednesday, December 15, 2021

レベル：中級 JMPの時系列分析、ARIMAモデル / State-Space Smoothing Models (SSM)を用いて、2021-2041年の予測期間における南極氷河の質量減少を調査しました。また、GRACE-FO衛星による氷床質量データを地球科学的・COVID-19的な観点で検討しています。非季節/季節ARIMAモデルは、12ヶ月の季節性および長期的な前年比トレンドパターン（平均2％の融解率の増加）を示しています。SSMによって、ランダムノイズを低減して予測精度を向上させ、予測範囲を少なくとも5年先まで拡大することができました。 ※発表者の都合により、プレゼンテーションと分析に使用したJMPデータのみの公開となりました。予めご了承ください。

0 attendees

0

Event has ended

0 attendees

0

Wednesday, December 15, 2021

レベル：初級錠剤の製剤設計に統計解析技術および機械学習を活⽤した事例を紹介する。 ①JMP：処⽅製法データに対する理解を深めるために、グラフビルダーや多変量の相関などの機能を⽤いてデータの関連性や可視化を試みた事例について紹介する。 ②JMP Pro：複数の製造⽅法で製造した顆粒や錠剤の物性を含むデータベースに弾性ネットを適⽤することで顆粒物性と錠剤物性の関係をモデル化し重要な顆粒物性の抽出した事例や、81種類の原薬とその物性を含むデータベースにブースティングツリーを適⽤することで原薬物性と錠剤物性の関係をモデル化し重要な原薬物性の抽出を試みた事例について紹介する。 ※この発表では動画の公開はございません。

0 attendees

0

Event has ended

0 attendees

0

Wednesday, December 15, 2021

レベル：中級製品開発における信頼性創り込みは、製品がメーカの手許を離れてからの顧客との約束（安心して選んでいただき、満足して使っていただき、自社のブランを信頼していただくこと）を守るための最重要な活動の１つです。このため、信頼性創り込みは全員参加で行う必要があり、製品の信頼性/安全性の保証には統計手法の利用が不可欠になります。本発表ではJMP（JMP Pro）を活用して信頼性創り込みに役立つ様々な手法の扱い方について紹介する予定です。

0 attendees

0

Event has ended

0 attendees

0

Wednesday, December 15, 2021

0 attendees

0

Event has ended

0 attendees

0

Wednesday, December 15, 2021

レベル：初級実験計画の有効性を疑う技術者はいないにもかかわらず，⽇本の産業分野では，誰もが実施しているという状況ではない．Design of Experimentsの「デザイン」を先⼈が計画と訳してしまったため，実験計画のテクニカルな側⾯のみが強調され，あたかも魔法の⼿法のように理解されていることにその⼀因がある．⾞や服の外観を決めることだけがデザインではなく，その本質は問題解決にある．近年，デザイン思考が周知されつつあることも⼿伝って，例えば，キャリア・プランとキャリア・デザインの違いを理解する⼈も増えてきた．そうであれば，今こそ問題解決としての実験計画を考え直してみる機会であろう．実験を通して問題解決することが実験計画であるならば，製品開発に携わる⼀部の技術者だけが実施すれば良いというわけではない．問題を抱える全ての関係者が全員参加してこそ，実験計画の威⼒が発揮されるのだ．とはいえ，開発部⾨の上下流，即ち企画部⾨やユーザーに近い部⾨では，慣れと経験がない中でどこから⼿をつければ良いか悩むことだろう．そこで，本発表では，デモ事例を⽤いて，幾つかの実験計画のヒントを紹介したい．

0 attendees

0

Event has ended

0 attendees

0

Wednesday, December 15, 2021

レベル：中級テキストマイニング⼿法は、品質傾向の分析や保守記録をテキストマイニングして保守を最適化した報告など、⼯学部⾨においても注⽬すべき活⽤事例がある。本報告では⽣産実績データをテキストマイニングすることで品質や⽣産性の問題点を発⾒して原因究明した事例を紹介する。⽣産実績データには機械やオペレータの番号、加⼯条件、材料条件、特記事項など数⼗の変数が含まれている。特記事項はイレギュラー事象が起きたときにオペレータがマニュアル⼊⼒したテキストデータで、内容には異常事象とは無関係の語などが含まれているのでそれをストップワードに登録して解析対象から除外し、次に「ミス、間違い、間違え」などの同義語を再コード化で⼀つの語に置き換える。こうした前処理を経て、頻度の多い異常語の出現の有無を列に保存し、その有無を⽣産条件と２変量の関係で分析することにより、機械、オペレータ、加⼯条件、材料条件など異常との相関が明らかになり、異常の発⽣原因を究明することができた。

0 attendees

0

Event has ended

0 attendees

0

Wednesday, December 15, 2021

レベル：中級筋萎縮性側索硬化症（ALS）は、主に上部および下部の運動ニューロンに影響を及ぼす神経変性疾患です。診断は正確を期すため様々な評価に基づいて行われますが、症状が出てから平均で12ヶ月以上かかるという状況が続いています。そのため、ALSの早期診断は、生存期間を延ばし、生活の質を向上させるために非常に重要です。今回の発表の目的は、シーケンシャルパターンマイニングとcSPADEを用いて、頻繁に起こる診断パターンを観察し、ALSの予測に役立つかどうかを評価することです。 ※このサイトでビデオを再生して画面右下の「CC」をクリックして「English」を選びますと英語の字幕が表示されます。

0 attendees

0

Event has ended

0 attendees

0

Wednesday, December 15, 2021

レベル：中級糖尿病患者のためのHIIT（High Intensity Interval Training）プロファイルをデザインします。糖尿病患者は、食事管理、メトホルミン薬、インスリン注射に加えて、糖質をより早く燃焼させるための高心拍数の運動を行う必要があります。HIITプロファイルを作成するのに必要な心拍数モデルを構築するために、トレッドミルの設定（傾斜、速度）に関して実験計画（DoE）の完全実施要因計画を実施しました。また、膝や足の怪我のリスクを回避するために、前十字靭帯の損傷やジャンプのパターンを3Dモーションバイオメカニクスで研究するとともに、モデルドリブンSPCを使用して傷害のメカニズムを分析しました。 ※このサイトでビデオを再生して画面右下の「CC」をクリックして「English」を選びますと英語の字幕が表示されます。

0 attendees

0

Event has ended

0 attendees

0

Wednesday, December 15, 2021

レベル：初級「百聞は一見にしかず」と言いますが、JMPのグラフビルダーで作成したグラフは、アートのように美しい見た目で情報を伝えます。ジャーナルを用いた今回の発表では、グラフレット、ランオーダースパイラル、マップカートグラム、時系列など、グラフやレポートに新たな息吹を吹き込む、美しく万人受けのする高度なグラフビューをグラフビルダーによって作成する方法をご紹介します。 ※このサイトでビデオを再生して画面右下の「CC」をクリックして「English」を選びますと英語の字幕が表示されます。本発表の発表資料やデータのダウンロードや、日本語トランスクリプトと共に視聴されたい方は、こちらをクリックしてください。Discovery Summit Americas 2021 Presentationの各発表のページに移動します。（SAS Profileへのログインを求められますので予めご了承ください。）

0 attendees

0

Event has ended

0 attendees

0

Wednesday, December 15, 2021

レベル：中級伝統的な実験計画法は，直交表に限らず因子が互いに直交していることを前提に分散分析というシグマを用いた平方和の分解を主体にした解析法である．ただし，欠測値が発生した場合にこの方法が適用できないので，分散分析に対する理論的な枠組みを与えてきた「線形推定論」による解析を行う必要がある．しかし，解説している書物を見いだすことは困難である． JMPの「モデルのあてはめ」は，欠測値がある直交表の解析にも対応しているが，直交表の解析の定番である「現行水準と最適水準の差の95%信頼区間の推定」をどのようして求めたら良いのであろうか．それらの誤差分散の推定のための「田口の公式」あるいは「伊奈の公式」は，JMPに組み込まれているのであろうか．どのような解析手順によって計算されているかを解説した書物なしに，JMPの結果をそのまま信頼して使うことは勇気のいることである．そこで，L8直交表で欠測値が1つある場合について，Excelの行列関数を用いた解析方法を示し，JMPによる解析結果と対比することにより，「モデルのあてはめ」の計算原理の理解を深め，更なる応用ができるような基礎的な計算方法を示す．

0 attendees

0

Event has ended

0 attendees

0

Wednesday, December 15, 2021

レベル：初級発表者はJMPとMYSQLを接続し数千万規模のデータを解析している。昨年度の本学会ではJMPとMYSQLの接続⽅法を解説したが、本年度は個⼈レベルで⼤量データを解析する場合に、⼊⼿した⼤量データを元にMYSQL 上にどのようにデータベースを⽣成して利⽤するかの実例を解説する。そのためMYSQL上のデータベース⽣成をExcelやAccessで⾏う⽅法、JMPで⾏う⽅法、⼤量データをエディタで加⼯しMYSQLで読み込む⽅法を⽰す。それと共に複数施設で作成して微妙に変数名やJOINした値が異なってしまったデータをJMPを使って統⼀する⽅法を実例を元に解説する。また、JMPからODBC経由でMYSQLを検索するときのポイントを説明する。本報告ではMYSQL特有の操作を多く解説するが、JMPユーザーにとってビッグデータを扱うときの有益な基礎技術になると考える。

0 attendees

0

Event has ended

0 attendees

0

Wednesday, December 15, 2021

レベル：中級日本人が古くから主食として食べている米飯の「おいしさ」は、炊きあがりが美しく、食感、におい（香り）、呈味の品質特性のバランスが重要とされている。米飯の「おいしさ」に影響する精白米の品質には、品種（銘柄米）、生産地、気候、栽培条件などが大きく関与する。さらに収穫後の貯蔵条件・期間などもあり、精白米の品質検査は米飯の「おいしさ」や品質の安定性を確認する上で非常に重要となる。また、米飯の「おいしさ」には搗精方法、加水率、炊飯方法などが大きく影響するが、米飯の食感の評価では、粘りと硬さが強く影響して、一定の硬さと強い粘りという品質特性が好まれることが多い。食品業界では、米飯を喫食する際に「おいしい」と感じる品質特性は何なのかということを、様々な理化学的な品質特性分析を通じて数値化・視覚化することが強く求められており、そのため品質特性分析した数値をJMP 15 proにより多変量解析し、その中から米飯の「おいしさ」を解析する取り組みについてご紹介させて頂きたい。

0 attendees

0

Event has ended

0 attendees

0

Wednesday, December 15, 2021

レベル：中級実験計画法を活用して工程改善の成果を挙げるには、まず対象工程内の誤差を低減し、次に目的に合った実験を計画、実行し、得られたデータを解析して最適解を求める。これを工程に適用して改善を確認し、もし改善が不十分であれば最適解を修正して確実に成果を挙げる。以上の取り組みは単なる実験計画法ではなく、一連の取組みを総合的にまとめた包括的実験計画法である。この習得するには、仮想的であっても計画、実行、解析、改善のPDCAを実践的に体験することが重要であり、従来は紙ヘリコプターやテーブルゴルフなどの教材が用いられた。しかしこれらには一人で短時間に効率よく学べない欠点があった。近年JMPのアドインソフトとして、誤差を含む工程データを柔軟に生成できる飛球シミュレーターが開発され、一人でも効率的に包括的実験計画法を習得できるようになった。そこで本研究では飛球シミュレーターとJMPの実験計画法プラットフォームとを連携させ、カスタム計画（最適計画）を前提に、包括的実験計画法のPDCAを体験学習する教育プログラムを考案した。特に非線形応答にも対応して最適解を回帰修正するために、多水準の繰返し実験を採用した点に特徴がある。

0 attendees

0

Event has ended

0 attendees

0

Wednesday, December 15, 2021

レベル：上級火災に関する研究の多くでは，その活動の個々の要素に着目し，その特徴を詳細に研究している．一方，実際の火災や消防活動における報告として，火災原因，出火時間，ポンプ台数や鎮圧時間などといった火災報告データが火災統計として整理されており，これは火災や消火活動のプロセス全体を概略的に示したものである．そのため，火災報告データを解析することで出火から鎮圧までの火災や消火活動のプロセス全体の関連構造を把握し，出火及び鎮圧時間に影響する要因を見つけることができる．そこで，本研究では2011年から2015年までの5年間の火災報告データをJMPの構造方程式モデリングによって分析を行い，火災及び消火活動の各要素の関係をモデリングした．その結果，火災や消火活動のプロセス全体の関連構造を解き明かすための潜在変数を特定することができた．

0 attendees

0

Event has ended

0 attendees

0

Wednesday, December 15, 2021

レベル：初級本研究は，脳神経クリニックにて医療の質を確保する目的で実施したアンケートから，初診患者の満足度を俯瞰的に把握するために分析したものである．本研究の前段階で，初診患者満足度を効果的でかつ効率的な維持・改善施策を立案する際に，重点指向に基づく選抜型両側因果分析が応用可能であることを明らかにした．それをもとに，今後も当クリニックでは患者満足度を維持・向上するために，患者満足度を俯瞰的に把握したいと考えた．そのために，これまでに収集したデータを用いて選抜型両側因果分析では採用されなかった影響度の弱い原因系項目も復元させ，SEM（構造方程式モデリング）に準じた分析を行う．選抜型両側因果分析で選抜された主要原因系項目とその背後にある因子は，SEMの分析でもある程度の適合度が得られる．しかし，影響度が低い原因系項目とその因子を加えることによって，因果構造をより俯瞰的に把握できるが，その一方で適合度指標は相対的に劣化する．今後，本研究の結果を用いて本来のSEMに近づける過程において，新たに患者満足度調査を実施することで適合度指標の改善を考えている．本発表では，一連の分析過程で得られた知見を紹介する．

0 attendees

0

Event has ended

0 attendees

0

Wednesday, December 15, 2021

レベル：中級本報告は、アートマネジメント分野におけるサービス品質評価の方法を提案するものである。統計分析には、JMP Pro 16を用いる。具体的には、公立美術館が提供するサービスの満足度について高橋・川﨑（2019）が提唱する選抜型多群主成分回帰分析の手法を参考に、公立美術館の利用者を対象に調査を行う。調査結果の分析を通じ、公立美術館が利用者に対してサービス提供を行う際に重視すべき要因を明らかにする。質問群はハード（建物・内装）、ソフト（職員の対応）、展示内容の3群による構成を想定している（各質問は7件法とする）。調査票は4ページである。まず広島大学の学生を対象に授業内で事前調査を行い、それをもとに調査票を作成し本調査を行う予定である。参考文献 Van Ryzin; G. G. (2006). Testing the expectancy disconfirmation model of citizen satisfaction with local government. Journal of Public Administration Research and Theory; 16(4); 599-611. 高橋武則・川崎昌（2019）『アンケートによる調査と仮想実験: 顧客満足度の把握と向上』日科技連。山本昭二（2010）『サービス・クォリティ: サービス品質の評価過程』千倉書房。 ※本発表の動画の公開は2022年1月31日をもって終了させていただきました。

0 attendees

0

Event has ended

0 attendees

0

Wednesday, December 15, 2021

レベル：初級昭和電工マテリアルズでは2017年頃から当社主力製品である機能性材料開発に統計的手法を取り入れる取組みを全社的に進めています。機能性材料開発は、数十に及ぶ設計変数をチューニングして、複数の目的変数を最適化する極めて複雑な設計最適化問題であり、それを解くために日々試行錯誤が繰り返されています。実験計画法等の統計的手法はこうした試行錯誤の負担軽減に有効ですが、大学で統計教育を受けていない多くの材料研究者にとって、統計的手法を実務に取り入れ、使いこなすのは容易ではありません。そこで我々が着目したのが、統計初心者でも使いやすいJMPでした。JMPを軸に統計スキルと材料知見を併せ持つ二刀流人財の育成を4年間実施して、現在では統計的手法の活用事例が日々多くの部門から挙がってきています。本発表では・材料メーカーとしてJMP＆統計学を推進する意義と効果・当社における社内データサイエンティスト育成の取り組み・各所属部門のJMPユーザー自身による教育・普及活動について推進担当者の立場からお話します。 ※本発表の動画の公開は2022年1月31日をもって終了させていただきました。

0 attendees

0

Event has ended

0 attendees

0

Wednesday, December 15, 2021

レベル：初級旭化成では工場現場のデータ活用を活性化させるため製造IoTプラットフォームの構築と人財育成の両輪で強化を進めています。製造IoTプラットフォーム上に実装する統計解析ツールとしてＪＭＰを採用し、工場現場のエンジニアに向けたデータ分析人財育成プログラムを進めてきました。２０１９年から開始し今年で３年目になります。単なる統計学の勉強ではなく、現場の課題に沿った実践的な教育により様々な工場現場で成果が生まれてきています。本日はこの人財育成の取り組みをご紹介いたします。

0 attendees

0

Event has ended

0 attendees

0

Monday, March 7, 2022

0 attendees

0

Event has ended

0 attendees

0

Monday, March 7, 2022

To quote Roseanne Roseannadana, "It just goes to show you. It's always something. If it's not one thing, it's another." When analyzing data, sometimes we feel very frustrated because just as one problem has been addressed, there is always another one just around the corner. It feels like it never ends, especially since we still need to meet each frustration and deal with it. When modeling, we need to pay attention to common problems, like those involving missing values, outliers, limits of detection, and unusual distributions, which can make all the difference between drawing the right conclusion – or the wrong one.

0 attendees

0

Event has ended

0 attendees

0

Monday, March 7, 2022

0 attendees

0

Event has ended

0 attendees

0

Monday, March 7, 2022

Textual analysis of written documents has become an important analytics tool in accounting and finance decision making. Several research papers have expanded the textual analysis and have also measured the written tone in financial documents converting the tone a quantitative score of optimism/pessimism. Some of the research has connected the tone to explain abnormal returns in the financial markets. We explore this connection with the use of the Sentiment Analysis platform in JMP Pro 16. Hello everyone, my name is N ilofar Varzgani. Today, I'm going to be presenting my research study that I have conducted using JMP Pro 16, and specifically within JMP Pro 16, I used the Textual Explorer platform as well as the Sentiment Analysis functionality within the Text Explorer platform. The title of this study is Textual Analysis of Earnings Conference Calls: Differences Between Firms. This study that I'm working on is with my two co- authors, Dr. U gras and Dr. Ayaydin. They're not going to be presenting with me here today, but they will surely attend the presentation itself. I'll start off with a little bit of an introduction. Textual analysis of written documents has become an important analytics tool in accounting and finance decision- making. Several research papers have expanded the textual analysis and have also measured the written tone in financial documents, converting the tone into a quantitative score which measures optimism or pessimism in the tone of the speaker. Some researchers have also connected this tone to explain abnormal returns in the financial markets. In my presentation today, I'm going to talk about how we try to explore this connection with the use of the Sentiment Analysis platform within JMP Pro 16. Let's get started here. Sorry. All right. Let's talk a little bit about the motivation behind this study. For many years, capital market studies have researched whether quantitative analytic information reported by firms such as earnings and revenue, or other accounting measures influence decision making. Recent studies have shown that in addition to these type of quantitative information, qualitative analytics from the firms and from the media influences i nvestor behavior. This qualitative information includes texts in 10- K reports, earnings press releases, conference call transcripts, comment letters to the SEC, analysts remarks, articles in media, and conversations in social media. Several studies have also shown the importance of earnings conference calls that immediately follow the quarterly earnings releases of public companies. In our study here, we're examining whether the impact of the earnings conference call tones varies across different groups of companies. Let's do a little bit of background on the literature that already covers textual analysis so far. Textual analysis has been used to analyze a variety of documents through alternative approaches, and one could categorize these approaches into three broad categories. The first one is the use of the Fog Index, which is basically a function of two variables: the average sentence length in number of words, and the word complexity, which basically measures the percentage of words with more than two syllables in it. The second category of techniques is the length of reports, although this one seems like a rather simplistic approach, but it has been useful because of the simplistic nature. There have been a couple of studies which have used the length of report to proxy as the complexity of the report itself. The third approach is the use of a word list. Now, in terms of the word list, there are a number of word lists that people have created themselves, such as the Henry word list or the Loughran and McDonald's word list. But in our study here, we utilized the inbuilt dictionary that JMP Pro comes with, and in addition to that inbuilt dictionary, we augmented that with some phrases, which we added as terms, as well as a custom list of stop words based on the sample data that we were working with. Let's talk a little bit about the data itself. Our data sample size is approximately 25,000 observations, which means that we had close to 25,000 earnings call transcripts that we analyzed, and the date range for those transcripts is from 2007 Quarter 1 to 2020 Quarter 4. We have tried to incorporate only the text portion of the earnings transcript call, removing any graphics or any special characters that might be a part of the earnings call transcript. All of these transcripts were downloaded from the LexisNexis database in RTF format. Just to give you a little bit of an intro as to what the earnings call transcript basically looks like. It starts off with the title, which mentions the name of the company for which the earnings call announcement is being made. It has the word Fair Disclosure Wire in the next line, followed by the date. Then the main call itself is divided into two sections. You have the presentation section, which are the prepared remarks of the managerial team, which is attending that call, basically, and then you also have a discussion between the analysts who are sitting in on the call live, and they ask questions to the managers, and the managers respond to those questions. So the prepared remarks and the QnA portion. Now, for this study specifically, we've only looked at the prepared remarks portion of the earnings transcripts, but later on, as an extension of this study, we're planning to also incorporate the QnA portion of the earnings call as well. In addition to those two blocks of text that are a part of the transcript, most of these transcripts also include a list of all the participants who are on the call, which includes all of the managers from the company side, as well as all of the analysts from different institutional investor sites. Let's talk about the methodology a little bit first. We extracted the transcript and the prepared remarks of the managers section was titled as the DocB ody. The Qn A was titled at the discussion, and we counted the number of analysts who attended each call. Now, keep in mind that because of the fact that not all calls have a Qn A segment, the Qn A part might be missing for some of the rows in our sample. Which is why for this conference and this study, we've only focused on the DocB ody, which is the prepared remarks portion of the earnings call. Then we also created columns which could be used as identifiers. A ticker column for further analysis, the year quarter, as well as a calculated column which measures the length in terms of the number of words in the prepared remarks section. Now, the distribution of length is interesting and we're going to show that output in a little bit. Before we moved on to the T ext Explorer platform, we changed the data type for the document body to unstructured text- sorry, the data type character and the modeling type to unstructured text so that the Textual Explorer platform can work. Before I show you the Textual Explorer platform, I just want to talk a little bit about the terminology that is going to be used a lot during the Text Explorer platform and the output shown. In textual analytics, a term or a token is the smallest piece of text similar to a word in a sentence. You can define terms in many ways, although the use of regular expressions or the process of breaking down the text into terms is called tokenization. Another important term that is going to pop up a lot when we see the output of the textual platform is a phrase which is a short collection of term. The platform has options to manage phrases that can be specified as terms in and of themselves. For example, for our earnings call study, a phrase which popped up a lot was something like effective tax rate. Although effective, tax, and rate, are three separate terms, but effective tax rate being used together most of the time we converted that phrase into a term itself so that we can analyze how many times that particular phrase as a whole is being used in these conference calls. Next is the document. A document basically refers to a collection of words. In a JMP data table, the unstructured text in each row of the text column corresponds to a document. Then we also have a corpus, which is basically a collection of all of the documents. Now, another important term which we're going to use later on during the output is the stop words. Stop words are basically common words which you would want to exclude from analysis. JMP does come with its own list of stop words, but there might be specific stop words in the data sample that you are using which would apply to that data set only. For us, we created a custom list of stop words which you can easily view in the Text Explorer platform. You can have a list of stop words in an excel format or in a txt file and then upload that within the Text Explorer platform and use those as stop words. Then finally is the process of stemming, which is basically combining words with identical beginnings or stems, so to say, to basically make sure that the result compiles those similar rooted words as one word. For example, jump, jumped, jumping, all would be treated as one single word instead of three separate words. Now, for our study over here, we decided to go with the no stemming option because there were some issues with the stemming that we noticed. For example, some words like ration could be used as a stem for words like acceleration, which has nothing to do with that word itself. So we decided to go with the no stemming option in our case. This slide over here, we look at the options that we selected for the Text Explorer platform. The main variable of interest is the DocB ody. We use the ID to ID each row of observation, and then we change the default options for each of these features. The maximum words per phrase, the maximum number of phrases, characters per word, maximum characters per word, to the ones that we thought would be suitable for our particular data set. A s you can see, that we increased the maximum ranges to a lot higher than what the defaults are just to be on the safe side. that we're not missing out any important terms within our analysis. Our initial output that pops up once you run the Text Explorer platform gives you a list of all of the terms which were highly used within the sample of data, as well as the phrases. We reviewed the list of phrases and selected the phrases that could be used as terms. There were a total of 30,000 phrases out of which 1,068 phrases were added to the term list. In addition to that, we also created our custom list of stop words. W e found it easier to basically import all the terms from our sample from JMP into Excel, sort those words, and then basically remove everything or count all of those words as stop words which have certain characteristics, for example, they had symbols or commas or dollar signs in them, or numbers which were being treated as text or common names, for example, John, Michael, David, et c. W e added all of those to our stop word list and we uploaded that into the Textual Explorer platform. Let's look at some of our output. The first analysis that we did was on the variable length of the prepared remarks. T he assumption over here is if the prepared remarks section of the management is longer, it basically shows that they have more to explain to the investors, and that is why the complexity or the tone of those reports might be different for the other reports, which are shorter in length where the managers don't have to do a lot of explaining. A s you can see with the distribution output on the left, the length of our sample data set over here is slightly asymmetric with tiny tail towards the right hand side, which means that there were some reports which were longer than the others. The mean of length is around 3,027 words and the median is around 2,9 66 words. You can see the mean and the median are not too far away from each other, which probably means that we can assume it to be symmetric as opposed to asymmetric like the histogram over here denotes. The difference between the mean and the median is not too much. We did look at the median length of the reports versus over the years, and a s you can see in 2007, the earnings calls were much longer than the years after that, and t hen we did see a slight bump in 2020 as well. If you look at quarter wise length of the reports, you'll also notice that Q4 generally has the longest reports because the management is explaining the functions and the operations of the company for the whole year and they're compiling results from the previous three quarters as well. In terms of the tickers which had the longest average length, we had Boston Properties as reporting the longest average length, at the average approximately close to 6,000 words, which is double the average length of the whole data set in general. Next we have is basically the word cloud. Now, just to compare stemming option versus non stemming option, on the screen, you see both the word cloud with the stemming option, which is on the right hand side, and the one without the stemming option. We preferred the no stemming option because it lets us see the words which show up in most of these earnings calls more often than the other words, whereas the stemming option might end up with a word cloud which is not very explanatory. As you can see, growth, new, revenue, increase. These are the words which pop up the most, which basically signal that managers are mostly optimistic and positive in their tones in their prepared remarks section of their reports. Then I also have a screenshot of the Sentiment Analysis platform which basically again tells you the distribution of the overall tone, positive or negative. A s you can see from this histogram over here, the overall tone of these prepared remarks is mostly very positive, with only very few earnings call transcript which fall in the negative sentiment portion of this distribution, which again signals that managers tend to be more positive and more optimistic when talking about the operations of the company so that they can signal the fact that the future is going to be bright and it's going to be better, and that definitely affects how the investors react to this tone. Next, we also decided to look at the overall sentiment of the calls, as well as the positive mean and the negative mean of the sentiments. As you can see the positive sentiment, mostly it's around a value of 60, whereas the negative sentiment, we see a bump around the negative sentiment of minus 40 so none of these earnings calls were too negative, even if the company performance was really bad for that particular quarter. B ecause they want to signal a brighter future, and not focus too much on the history. If you look at the overall sentiment versus the years, you'll notice that the overall sentiment was much lower during the financial crisis of 2007- 2008, and then it bumped hugely in 2009- 2010 and overall, it has been relatively steady except for 2020 when the pandemic hit. If you break it down quarter wise, you can see that the bottom center graph over here shows that some quarters, specifically the fourth quarter, might see a drop in the tone of the overall sentiment in the whole data. If you look at the length versus the year, again, you'll notice that the length was much higher in 2007, it dropped in 2008, again picked up a peak in 2009, and overall, the length has reduced over time until 2020 itself. It might be a safe assumption to make that when times are tough and the companies have more to explain, the earnings call tend to become longer and the prepared remarks are longer. However, if you look at length versus the overall sentiment, you'll notice that there seems to be a slight positive relationship between length and overall sentiment, but it's definitely not a simple relationship like a linear upward trend. Instead, the data is quite heteroscedastic. Here I have a list of companies which showed up with the highest overall sentiment over the years versus the companies which showed up with the lowest overall sentiment over the years. I also put the industries in which they belong just as an interesting piece of information that we noticed. For example, a lot of the positive sentiment calls were the ones from the technology services area or financial services, whereas the lowest sentiment was in the waste management or medical technology industries. In terms of the future research that we plan to do on this topic, we want to examine the tone of these earnings call, and do a cross analysis with variables like managerial strategic incentives with disclosures, the impact the tone has on analysts and investors, as well as some variables which are specific to the firm, such as the size, their complexity, their age, et c. W e also plan to explore the term selection for building data mining models using the Text Explorer platform within JMP Pro. Thank you so much for attending this presentation and hopefully we can answer any questions that you may have about our presentation today. Thank you .

0 attendees

0

Event has ended

0 attendees

0

Monday, March 7, 2022

Åke Öhrlund, Galderma, Galderma Data from a tensile tester contained 92 runs, each with four columns and 3,800 rows. The sample name was between the header and data, in one of the four columns. At first, the data was imported and stacked, which omitted the sample name. Next, the sample name was imported and stacked. Finally, the sample name was joined to the data table, allowing visualization and analysis. Instead of preparing the data in Excel, it was imported to JMP and formatted, saving hours of work and preventing possible errors. Hi, my name is Åke Öhrlund . I work at Galderma in Uppsala in Sweden. I've been using JMP for many, many years, almost my whole working life so far, but apparently I'm still learning. That's what I want to share today. One of the things I've come to realize more late is I've spent too much time working on data, preparing it for getting into JMP in Excel; c utting and pasting. I want to show you an example how you can do that much faster in JMP. I got this data set from a colleague of mine. It was a Tensile testing data. It was 92 runs, each one 3900 rows of data. It was layout like this with four columns for each run, so 444, 92 times. On row number three there was a sample name crammed in and then the third 900 rows of data. I couldn't pick it directly into JMP, so she said, "You want me to cut and paste it to be in same columns like you used to in JMP?" I said, "N o. I'm going to try to do it directly JMP." This is what I did. I started by importing the data leaving out row number three. I just have to tell JMP that is two header rows and it should skip row number three and start on row number four. Do that and I have all the data with the two rows of header on top. But now this is a white table so I want to stack this. S elect all the columns, put them into stack. Now I have to tell JMP there's four of these columns that go together, and JMP actually seems to get four by four the way they should be. I just click okay, and then I have a table containing all the data, a lot of data rows stacked on top of each other. I only have four columns actually containing the data and four columns called label that described what is in those data columns. I could be working from here, but rather also have the sample name there. Do the import again; st art all over. This time I want row number three still including both header rows and include row number three, so data states starts here. Click next, and then I have to tell you that you could skip everything after row three. There you go. You have one row of data containing the sample names. Still this is wide so I want it in the long version like I have with the data table, so do stack again. Select all the columns in to stack. This time I don't have to tell JMP anything, just stack what you have there. There you have it. Another data table with all the sample names here. I named this one Sample. This one I can match with the original data table. That's what I'll do. I go to the data table, it's named Untitled Three. Then I go to update, and in update I say, "Update with data from Untitled Six." What should I take from there? Selected the columns, sample, replace nothing in the old data table and now I match the label here with label here. Click okay and I end up with a data table with all the data and all the sample names. Again, I could work from here but since this was quite fast I want to tidy up a bit. I want these labels to go up here, this label to go up there and so on. J ust copy these and go down here and paste them into the data two, data three, data four. They end up here. After doing that, I don't need the label columns anymore. I can just remove those. Now I have all the data in their columns headings with the right column headings and the sample name here. Now, of course, this is where the actual fun begins. It's very easy now to pick out certain samples and do what not. That's the fun part of JMP, of course. What I conveyor here is try not wasting too much time in Excel cutting and pasting because chances are you might as well do it in JMP in much, much less time. I've been using JMP for many, many years and still trying to realize all the stuff you can do so I'm going to try this a lot more. That's all from me today. Thank you very much for listening.

0 attendees

0

Event has ended

0 attendees

0

Monday, March 7, 2022

0 attendees

0

Event has ended

0 attendees

0

Monday, March 7, 2022

The BioChaperone platform developed by Adocia covers oligomers of various sizes. However, they have the common characteristic of being of higher molecular weight than the impurities generated during their synthesis process. Thus, their purification can be achieved by diafiltration or tangential filtration. This technique, thanks to a membrane with pores of defined size, makes it possible to separate the molecules according to their molecular weight. Different parameters such as temperature, tangential flow rate, and pressures have impacts on the purification efficiency, as well as on the duration of the operation. However, the multivariate study of this step was hampered by technical constraints, such as the impossibility to change the temperature between each trial, thus limiting the randomization. Therefore, a split-plot design approach implementing a randomized block system was developed to optimize the purifications of our excipients. Hi everyone. First of all I want to thank the Discovery Summit committee for letting me the possibility to present to you the work we perform in my team on the implementation of a split-plot design platform to study the purification of our innovative excipients. First of all, ADOCIA is a biotechnology company founded in 2005 by Gérard Soula and his two son, and we are located in Lyon, France. Our mission is to develop innovative formulation of approved hormones for the treatment of diabetes and obesity. The business model it is license product after the approved concept. And currently in our pipeline we have three patented technology platforms. And one product that is approved to enter phase three in China. Five products with clinical proof of concept, and six projects that are the preclinical stage. We are about 115 people in ADOCIA and 80 percent are dedicated to the R&D. Speaking about the technology platforms. Today we will speak about the historical one, which is the BioChaperone platform. The BioChaperone is a pharmaceutical excipient a synthetic organic one, and it will form a complex with a protein such as insulin or amylin or glucagon. This complex inspired by nature will improve the solubility or stability, accelerate the absorption of the peptide or protect it against enzymatic degradation. BioChaperone platform potentializes the performance of insulins and other hormones. And today five of proprietary products based on BioChaperone are in clinical development. The development BioChaperone chemistry I think it's the same in many pharmaceutical company. We will go from a early stage process that will deliver batches of few grams that will enter preclinical studies such as toxicology efficacy. And once BioChaperone is designed as a lead it will come in my department, in my team, to develop a final process to deliver phase three batches. And at the end commercial batches ranges at few hundred kilos per year. The changes that we will face are imposed by large scale feasibility and we will be driven by cost and performance and many changes will be suffered. So we need to understood them and document them. The goal of our work is to have a complete understanding of the relationship between parameter of variation and their impact on product quality. Indeed, as we will perform at large scale, we know that temperature cannot be targeted at 10.0 degree every time, it can be 11, it can be nine. This is suffered due to the large scale. We will speak about robustness, that the process need to absorb this inherent variability in a defined range. And we need to know its impact on the product quality. At this stage we will go from reproducibility at early stage, to robustness at large scale and final process stage. To do that we have tools. Two of them are the risk analysis that will help us to prioritize, to rank the work. And DoE are very useful because we know that in chemistry we have a lot of interactions between parameters, so this is very useful for us. Today, we will speak about the purification of the excipients. This purification is done by diafiltration. A quick overview of the process is that raw material enters a chemical transformation, that will give a crude excipient in solution. This crude excipient in solution will be purified by diafiltration to give the pure excipient in solution. What is diafiltration? We have first of all the classical filtrations that is called the dead-end filtration in which we have a solid or that in excipient in a solution and a membrane. the solution will come up to bottom with the pressure to recover the solid on the upper face of the membrane. A cross-flow filtration we have the retentate which is brought in parallel to the membrane using a pump. And we will apply a pressure using valves that will push a part of the flux to go through the membrane, that process defined pores and let only small molecules to go through the membrane. And we have the big molecules that will stay in the retentate. And we use this technology because we know that our excipient are oligomers, which means that they are not small molecules. They are not polymers, they are between the two sizes, but they are quite big and they will stay in the retentate. A quick overview of the unit. We have the retentate. A vessel with the retentate which will be brought through the membrane using a feed pump and a back pressure valve will allow us to have a pressure in the membrane and push through a part of the flux in the permeate that will eliminate the small molecule impurities. On the right you have the 50 liter scale diafiltration pilot that we use at ADOCIA on the back, the 50 liter vessel. On the up left we have the housing with the membrane and all the pipes used with valves and instruments to monitor the work process. For our study we have only one bulk that we can use. One bulk of crude excipient solution. The idea was to have a flow circulation of the flux meaning that the permeate is brought back to retantate every time like that we will have a retentate which is representative of the process. At the beginning of every event of the DoE. For a dry filtration we have factors that are determining very early in the process, which is the membrane reference, meaning that the cutoff size of the pore and the material of constrictions. Once it's set, it's set and we will not change it. And the loading, meaning the kilograms of excipient per surface of membrane, membrane are defined surface and the kilogram of excipient is defined by the process. When it's set, it's set. We will not change it. What we can tune as factor RDSA. The concentration of the BioChaperone in the solutions that we need to purify, the feed flow, which is the flux imposed by the feed pump. The transmembranar pressure, which is a way to control the pressure that will push the permeate through the membrane and the temperature of the solution. The responses we can look at which are the losses through the membrane that will impact the yield of the process, the impurities that goes through the membrane that are in the permeate that will impact the quality of the product, which is the most important part of this work. And permeate flow rate that will impact time. It will give insight on the whole process time at large scale. The objective of the study was to define design space which is a multivariate space that guarantees the conformity of the responses, Here, it's the quality of the product. And we will go for a design type which is a response surface model. The first attempt we ran. We ran the Box-Behnken design. To run a Box-Behnken design you go on DoE classical response surface design. I will load the response. Run one response and I will load the factors. One factors. Okay. Yes. Here we have the three responses. I speak about earlier losses, the elimination, the impurity elimination and the permeate flux. And we have the four factors which are temperature, pressure, concentration or assay and feed flow. Box-Behnken is the first proposal in this box. We will continue to make the table. And we have the standard Box-Behnken table with randomized run as you can see. We started to run this DoE and after one trial it was clear that we will not be able to run this DOE in a randomized order because we cannot concentrate or dilute the bulk between each one. If we concentrate the bulk through the membrane it's perfectly feasible but we will lose some impurities. Our bulk is not anymore representative of the upstream process. It is not a solution. We can distillate the bulk but it will take very long time because it's water. And the second parameters that will not be easy to change between each factor is the temperature due to the recirculation is quite longer to stabilize the temperature between each run when we have to change it. We've done something that will make some people scream in the audience. But we ordered the run by temperature. And by assays. Let me just add some color on it to have a better view. Value color. This is what we've done. We have assay which are in block with 90 then the 60 block and the 30 block and in each assay block we have temperature which has ordered 40, 30, 20 etc. We've run this DoE like that. Here are the data we obtain. We can analyze it using the fit model platform with the four factors and we will look at losses and run. We have quite a good model for the losses. It's okay, we have a good PValue, we have parameters. It's okay. But the thing is we were quite disappointed by the statistical approach because we know that the first rule is to randomize run to have a good estimation of the error. It was not satisfactory. We came back to our studies I would say and just as a reminder the state of play was that we cannot use a fresh batch between each one. Too much bulk will be required and we don't have it. The bulk assay cannot be changed between each one due to representativeness of the bulk and temperature cannot be changed easily because it will be really highly time consuming. We look at books and at the end of many books we found a solution which are the split plot design. Split plot design were introduced by the agriculture field because they have typically have to change factor such as farming fields. Let's say you want to study the different treatments on different cereal or crops and you don't have any room on one field. We will have many fields and these fields are different. But this is not the thing you want to study. The idea behind split plot design is to have a whole plot which are filled with that will be analyzed as random blocks. And then on each whole plot you will apply treatment or culture one, two, three, four and you will study it inside the whole plot and they are called subplots. How it is done on JMP. I will close this. This. This. And this. To run a split plot design you have to go on DoE platform custom design. I will load the responses for the run two. Open. And the factors. What we see here. Sorry. We see our four factors assay temperature, CFF is the feed flow and the pressure. And we have an additional column which is name changes. And you can tune the fact if it's hard, very hard to change or easy to change. Very hard to change is the concentration or the assay. The hard to change factor is temperature because we can change it but we don't inside assay block. And the easy factors are the two other factors that can be randomized between runs. We want to go for surface response model. You click on RSM here we put six whole plots and 12 whole plots. We have 36 runs and we make the design. It will take a few seconds to make the design. Just to remind you that for the Box-Behnken DoE. It was 27 runs. We have much more runs on this DoE. I will make the table and as for the Box-Behnken I will add some colors. Column. What did I do. Sorry, I redo the table. Yes. Okay, sorry. Yes. Okay. You see that we have assay which are arranging blocks and it corresponds to the whole blocks one, two, three, four etc and in each assay block you have temperature blocks which are the subplots one, two, three, four etc. And CFF and TMP are randomized inside those two blocks. We perform this DOE data and you can go for the model. Here we see the differences between the analysis between standard Box-Behnken DoE and split plot design. We have the whole plots that are added in the effects and they are treated as random blocks and we have all the other parameters and effects. And here we have the method which is the REML analysis method. We run the DoE. If we focus on the losses answer we see that we have a very good model with 96 percent of the variation explained by this model. We see that we have significant effects. And the additional box we have to look at with this analysis is the REML variance components estimates that will give us an insight on the behalf introduced by the blocks, the whole plots and we see that the PValue is not significant. We can go further. There are no issues with the blocks so such like other DOE performing JMP we can go and have a profile to optimize to define ranges and design space. Here we see that the assay is the most impacted factor on every responses and losses is the response that is impacted by all the four parameters of factors and we can see that with the parameters we use as target we are pretty good with the optimization. What could be interesting is to look at this model, this DoE using a standard analysis. I remove everything. I will take the four factor. All this. For standard analysis. I run it. Here. I come back under the losses and response. Okay. Here I will look at the losses. What we can see is that we don't have the exact same order for the parameters, effect or estimate. In the standard analysis we can say that assay is the most impacting factor while in the spectral design is the flux, assay's the fifth one and it's quite normal because when we use blocks and we do the standard analysis we will give more strength to these blocks and we will make errors on it and we will define it as impacting but it's not. This is perfectly normal. This is why you need to do an analysis using REML and blocks to have the right order of impacting factor. In conclusion of this study. In this case study split plot design allowed to carry on regardless of non randomization of some parameters. We were able to run in a shorter time frame the whole DoE. Even if it required 36 runs versus 27. But as we don't need to concentrate to stabilize the plotter between each run it was much more shorter. And we will be able to justify properly the design space with strong statistical evidence. It is worth noting that the split plot design platform is now implemented to quickly develop and optimize our proprietary excipient purifications and as take away messages. Just be careful on factor randomization. We know it's the first rule to run a DOE but it's very important to have a proper design that will allow you to have a statistical knowledge on your process and a proper design could allow to save time even if more runs are required. Thank you for your attention.

0 attendees

0

Event has ended

0 attendees

0

Monday, March 7, 2022

0 attendees

0

Event has ended

0 attendees

0

Monday, March 7, 2022

Accessing data is often a very time-consuming and aggravating step of the data workflow. It can delay the implementation of solutions or allow problems to persist undetected, often with disastrous consequences. Why can’t someone create an “easy button” for data access, allowing you to pick your data source and filter down to what you want? Could it be used to combine data from multiple sources, if needed, and even automate reporting so that issues are flagged immediately rather than at the end of a lengthy and tedious “data munging” exercise? In this talk, we show how to create an interface from JMP’s application builder utilizing SQL, and then build a WebAPI example from scratch. The methods shown could also be equally applied to data stored in other formats. The final application will make data access as easy and fast as pushing a button.

0 attendees

0

Event has ended