Re: Need help in determing what data should be threated as random in a repeated ...

不適切なコンテンツを報告 · Jun 10, 2023 1:48 PM

こんにちは。

誰かがこれを手伝ってくれるか、私がここで良い道を進んでいることを確認できますか？

月（1月から3月まで）、月の日（1から31まで）、シフト（1から3まで）、シフト内の総エネルギー消費量（これは従属変数）、および作業時間を含む長い形式のデータがありますシフト内の12の異なるシステム（マシン）の時間単位（0から8時間）。

各システムのエネルギー消費量を調べ、各システムの予想稼働時間に基づいて将来の予想エネルギー消費量を予測できるようにしたいと考えています。

繰り返される対策があるので、混合モデルは私にとって適切であるように思われます。今、私は私が固定された、ランダムな、そして繰り返されるmasureとして扱うべきものを発行しました。

1年以内のすべての月のサンプル（3か月）しか含まれていないため、月をランダムに考えています。また、観察はランダムではなく、連続的です。これは問題になる可能性がありますか？

私は日を見て、1か月以内にすべての日とシフトをカバーするので、固定としてシフトします。これは適切なアプローチですか？

反復構造についてその日は反復測定すべきだと思いますが、どのような構造が適切ですか？

また、何をどのようにネストする必要があるかを説明してください。ネスティングは数日、数ヶ月以内の日にシフトすると思います。これはできますか？どうすればいいですか？

これは私の直感的な考え方です。フィードバックとヘルプを送っていただければ幸いです。また、これがJMPでの私の最初の経験であるため、ネストする方法のチュートリアルを共有することは高く評価されます。

皆さん、ありがとうございました

SDF1 · Jan 27, 2022 02:50 PM

こんにちは@ MachineHippo109 、

興味深い問題のように聞こえます。 JMPファイルでデータを共有できますか？データテーブルの構造は、私自身で何かを構成し、それが正しいデータ構造を持っていることを確認するために少し複雑に聞こえます。

異なるシフトで0〜8時間動作する、12台の異なるマシンで時系列データを収集しているようです。これは、多くの異なるアプローチを使用してモデル化できます。 1つのアプローチは、時系列予測を行うことです。もう1つは、NNまたはBoosted Treesである可能性があります。あるいは、GenRegモデルでさえデータに適合させるのに適している可能性があります。でも、総稼働時間に基づいて、各マシンの将来の消費電力を予測しようとしているようですね。

試してみて、どのモデリング方法が最適かを確認したいと思います。間違いなく行う必要があることの1つは、可能な限り最高のモデルを使用できるように、データをトレーニング/検証/テストセットに分割することです。または、同様の相関構造でデータをシミュレートして、シミュレートされたデータを使用してモデルをトレーニングし、実際のデータでテストできるようにする必要があります。

あなたがデータを共有することができれば、私はいくつかのことを試みて、私が見つけたものをあなたに知らせるかもしれません。

楽しそう！、

DS

MachineHippo109 · Jan 28, 2022 03:46 AM

こんにちはSDF1そしてあなたの速い返事をありがとう。

残念ながら、インターンシップ先の会社からの実際のデータであるため、データを共有することはできません。ただし、データは次のようになります。

月日シフト消費量MWhSystem1 in Hr System2 in Hr System3 in Hr......。

1 1 1 3.8 8 2 5......。

1 1 2 3.2 8 6 6......。

私は各システムがどれだけのエネルギーを消費するかにもっと興味があり、もちろん計画された生産/作業時間に基づいて将来の消費を予測します。

martindemel · Jan 28, 2022 07:58 AM

数年前、私はリソースプランニングモデルで顧客を支援しました。彼らは、プロジェクトリーダーが複雑さやイノベーションレベルなどを入力できるプロジェクト計画を立てており、プロジェクトを完了するために必要なリソースと時間を想定したいと考えていました。彼らがプロジェクトリーダーの経験によってそれを行う前に、新しいモデルはほとんどすべての場合で彼らをはるかに上回っていたことが判明しました。

プロジェクトのさまざまなフェーズに時間を費やし、以前のデータと実際の長さから各フェーズに関与するリソース（人）がいました。

成功の秘訣は、モデル全体がうまく機能しなかったため、さまざまなシナリオの適切な予測子を見つけるためのクラスタリング手順（ここでは潜在クラス分析）でした。

次に、クラスターシナリオごとにモデルを作成し、検証セットとテストセットを使用して最適なモデルを見つけ、見えないデータに対してテストしました。

モデリングプロセスでは、ブートストラップフォレスト、さまざまな方法（ラッソ、エラスティックネットなど）を使用した一般化回帰、ニューラルネットなど、JMPProのさまざまな戦略を比較しました。これらを比較することは、あるモデルのパフォーマンスが向上する可能性のある欠陥や状況（およびその理由）を理解するために重要でした（場合によっては、あまり関心がない設定であったため、他のモデルに対してより優れたパフォーマンスを発揮するモデルが選択されます。全体的なパフォーマンスが少し低かった場合）。

最後に、クラスター化されたシナリオのモデルを用意し、適切なモデルを使用して、プロジェクトの設定に基づいて時間スケールとリソース計画を予測しました。

あなたの状況への移行はそれほど難しいことではないと思います。私が指摘したいのは、モデリングプロセスに直接入る前に、まずデータを理解する必要があるということです。その潜んでいる変数に対して同じパラメータ設定に対して異なる出力を生成するシナリオ（ラインまたはテスト機器のように、まだデータにない可能性があります...）があるかもしれません。すべてのデータを1つの大きなバスケットにまとめると、予測が悪くなる可能性があります。

MachineHippo109 · Jan 29, 2022 03:02 PM

Hi Martin and thanks for your replay.

Your aproach is sound. In fact each of the shift have a rather distinctive patters. System 1 that runs 24/7/365 and System 2 that runs when even one of the other System is running and in 95% of the cased in the third shift only System 12 is running.

I tought on fitting 3 sepearate models for each shift, but then tought that mixed model can just solve this problem, esecially when I recalled from the back of my head that repeated measurements = mixed model.

SDF1 · Jan 28, 2022 09:32 AM

Hi @MachineHippo109 ,

The problem sounds pretty straightforward, but it will require you to explore your data using many different platforms within JMP. Evaluating things like GenReg, decision trees, SVM, NN, KNN, PLS, and just a standard least squares, for example. I do not think your data structure is in the right format for a time-series analysis/prediction. The time series platform requires unique time identifiers for each observation (row). Hence, having shifts 1, 2, 3 all for Day 1 (and so on) wouldn't work.

Based on the structure that you shared, it appears that each system (1-12) is in operation for 0-8 hours during a given shift on a given day and the power consumption is the total power consumption for all the systems.

When developing a model, if you use the standard Fit Model platform, you might want to think/consult with your coworkers to see if there is any reason that you might need/want to include any crossed (mixed) terms in your model. For example, perhaps you know that there's some crossed term between systemX Hrs and the shift, such as maybe System5 is always run for at least 6 hrs during shift 3, but less than 4 hrs for the other two shifts. This would introduce an effect where the shift and system usage hours are intermixed. I do not recommend throwing that in unless you have prior evidence for such interactions to exist.

As mentioned previously, you'll want to partition the data (Power in MWh) into at least training and validation sets, and you'll probably want to stratify it by both Power and Shift so that the training and validation sets are equally represented across shifts and maintain the same distribution for the System Hr usage. I've attached a mock data table with the same structure as yours that would be for Jan and Feb. The validation column I made, I stratified by both Power and Shift, and you can explore the distributions and see how the data is partitioned -- 75% training and 25% validation. You might even consider doing a 60/20/20 split 60% training, 20% validation and 20% test and then split off (make it as a subset) the test data for comparing the different models.

Ultimately, you'll need to explore models with several different platforms and optimize the fit with each model. Once you've generated several models, you'll probably want to see which one performs the best by evaluating each model on the withheld "test" set -- this would be data that is not used in training or validating models. The best performing model would be the one you would want to deploy for your company.

To come back to your thread's title, it doesn't sound like you need to have a random effect included in your model. I don't think you need to treat any of the data as random.

Hope this helps!,

DS

MachineHippo109 · Jan 29, 2022 03:31 PM

Hi SDF1 and thanks for your replay.

There is no doubt that I will defenetly need to try different modeling techniques.

Well based on data when only System 1 is working consumption is steady around 0.5 MWh. System 2 works when even one other system works (exept System1 who works all the time. In 3 shift usually only System 12 is running with System 1 and System 2. In shift 1 there are cases when all system works. Now what I have noticed when I plot the data is that shift 2 and 3 are almost identical and with ready values of around 80% if those in shift 1.

Anyway I turned toward mixed model because repeated measues were almost synonomus for mixed model.

However now I am questioning if observations are independent or corelated? It looks to me that they are corelated abd working in shift 3 will influence to a certain degree on consumtion in the shift 1 next days as Systems have already achieved working teperature and conditioning. However, working in shift 3 does not influence on the shift 1, as production schedule is determined by other factor.

However, there is no doubt that I will need to try diferent methods.

martindemel · Feb 2, 2022 05:35 AM

I'm with SDF1 that a mixed model approach would be appropriate/necessary here. You have to think about what is your random effect here. Just as a simple example:

You have two species and measure for them each season some key measure for the same subjects within the species. This is repeated measures over the course of the seasons.

Applied to your scenario it would only make sense if you would like to understand if there is a month to month difference, or day to day difference, but your projects are all different so each row (observation) is independent from the other rows. There is no real repeated measure unless you do exactly the same project twice, and even then you might argue that they are independent from each other and just help you to assess the error/variance.

The systems measures seem to me not as repeated measurements, as they sum up to the overall time. They may have some interaction but again, there is nothing about repeated measures I can see.

/****NeverStopLearning****/

繰り返し測定する際に、どのデータがランダムとして脅威にさらされるべきかを判断するのに助けが必要

Re：繰り返し測定でランダムとして脅威にさらされるべきデータを決定するのに助けが必要

Re：繰り返し測定でランダムとして脅威にさらされるべきデータを決定するのに助けが必要

Re：繰り返し測定でランダムとして脅威にさらされるべきデータを決定するのに助けが必要

Re: Need help in determing what data should be threated as random in a repeated measurements

Re: Need help in determing what data should be threated as random in a repeated measurements

Re: Need help in determing what data should be threated as random in a repeated measurements

Re: Need help in determing what data should be threated as random in a repeated measurements

おすすめの記事

Get Going with JMP: Essentials for Using JMP

Hiding and Excluding Data

Adding Markers, Colors, and Row Legends