Solved: Re: How to take events into account correctly in a model?

Report Inappropriate Content · Jun 8, 2023 5:48 PM

Hello everyone,
I would like to consult you for a modeling.

I have a process with a response Y and parameters X1, X2, X3, etc.
The analysis of the process took place over a long period of 3 years. I would now like to gather all the data to analyze the process as a whole. The problem is that the machine associated with the process has experienced some changes and even some hazards during these 3 years. All of them were identified as Evt_A, Evt_B, Evt_C at a specific date. In order to get a good modeling of Y, I would like to take into account not only the parameters X1, X2, X3, etc. but also the events A, B, C, etc. to know if they had a real influence and to quantify their influences. For the time being, I have introduced the events as categorical columns with the values -1 and +1, -1 meaning "before the event" and +1 "after the event".

I ask myself the following questions:
1. Is this the only way to proceed?
2. Is this the right way to proceed?
3. How to introduce correctly these events in the Fit Model platform? Should I use specific attributes? Are interactions possible?

I thank you in advance for all the help you can give me.
Sincerely,
Stéphane GEORGES

Victor_G · Apr 25, 2022 6:36 AM

Hello @Steph_Georges,

If you want to aggregate all the DoE's data you have collected on this equipment, but record the potential changes in machine settings (between DoE/time periods), it may be possible to introduce a new column like "change" or "setting" (categorical by default, unless you have more information to characterize more precisely the changes in settings through numerical continuous values), and for each DoE or time period, indicate in which condition it was run.
Example : DoEs 1 to 14 at setting "A" (initial equipment settings), then DoEs 15 to 20 at "B", 20 to 24 at "C"...

I see two options for modelization/analysis depending on how you want to set this "change" parameter in your model (either as a fixed effect or random effect) :
- If you have only limited changes that are reproducible (and no other future settings changes possible), you can treat this new column as a fixed effect in your analysis. Then you can analyze it through "Fit Least Squares", "Stepwise" or "Generalized Regression" (JMP Pro) with your other effects, and you'll have estimates about each level of possibles changes you have met, so you'll be able to quantify quite precisely the impact of changes on the response(s) (how a change setting from A to B will increase/decrease the response(s) for example).
- If these changes are only a small part of possibles (and future) changes on the equipment, and you are therefore more interested in knowing if a change in equipment setting has created a change in variance, you can treat this new column as a random effect and analyze your results through a "Mixed Model" or "Generalized Mixed Model" (JMP Pro) by indicating that this variable is a random effect. Then, JMP will provide you an REML analysis (variance analysis to know if this random effect is significant, that is checking if there is a significant change in variance when settings are changed).

There was an interesting discussion about fixed vs. random effects here : Random vs Fixed Blocking Factor in DOE

This could be a first approach in analysis before going into more complex modeling if/when needed.

I hope this will bring you new idea for analysis

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

View solution in original post

dale_lehman · Apr 23, 2022 08:46 AM

One question I have is whether the events are all different or can they be aggregated? Personally, I would code the events as (0,1) and if they can be treated the same, then the event can be reduced to a single variable indicating whether it was an event time or not. If they are all different, then I suspect you need different indicator variables for each event. Another question is whether the event is confined to the time at which it happens, or are time periods before or after the event time subject to being influenced by the event. In other words, if the proximity to the event time matters, then I would code all time periods in terms of how far they are from an event time (e.g. -3, -2, -1, 0, 1,2,3). If you can provide more detail about the nature of the events and their effects, then I think it would be easier to discuss appropriate modeling choices. I've used event analysis - but applied to regulatory events rather than machine events. In that setting, it is important to not focus too much on the actual event time, since regulatory "events" are often anticipated to a large degree. When a political "event" happens that is not foreseen, then the indicator variable approach makes more sense. I would suspect that machine events are more discrete, but it still depends on the nature of those "events."

Steph_Georges · Apr 23, 2022 10:07 AM

Hi Dale,

Thank you for your answer. I will try to be more explicit.

I have a laser welding process that I am trying to model. Over the last 3 years, I have performed over 25 DoEs on our machine with materials of various grades and thicknesses. I have gathered all this knowledge and experience in order to create a welding model to properly preset the machine.

During these 3 years, the machine continued to evolve technologically in small steps. At different times, changes (e.g. in design) have taken place, potentially affecting (or not) the laser welding phenomenon. Thus, not all the tests were performed under the same conditions. I would like to take them into account in order to evaluate their impact and, as much as possible, to erase them in order to go back to the effects of the pure welding parameters. Thus, even if the events are punctual, their effects are persistent in time. This is why I was talking about before and after. I agree with taking the effects into account in the form 0/1 (0: effect does not exist, 1: effect exists).

Thus, in time we could have:

............................T1..................... T2........................................

---------------------------------------------------------------------------------

A: 000000000000111111111111111111111111111111111111111

B: 0000000000000000000000000111111111111111111111111

Before T1, the tests are independent of the effects of events A and B. After T1, the effect of event A must be taken into account in the tests After the dates T1 and T2, the effects of A and B must be taken into account. The dates are not important in themselves, it is the effects of A, B, etc. that interest me and that I would like to evaluate.

dale_lehman · Apr 23, 2022 11:14 AM

If the events are in some sense "equal" I would simply record the number of events that apply at each point in time. However, it seems likely that each event has its own distinct characteristics. This would mean you need a more complex feature that recognizes the series of effects present at each point in time. You can create such a variable by concatenating all the 0/1 event data at each time point - however, I suspect you don't have enough data to model the large number of features that would result. I think the key to reducing the number of features lies in the fact that the events are cumulative - that is, they added on top of previous events. If you are willing to only care about the number of events (back to my first point), then it would simply be the number of events at each time point. You might be able to identify a few "key" events that are different/more important than the others and use indicator variables for those key events - in conjunction with the cumulative number of events at each time point.

Someone else may have a better suggestion for this type of data. I haven't worked with anything like this.

Steph_Georges · Apr 24, 2022 02:26 PM

Hi Dale,

My events are indeed not of equal importance. Intuitively, some may affect my answer little and others a little more. After some tests, taking into account the number of events alone does not help ... but it was a great point and worth testing.

Thanks for your support.

Victor_G · Apr 25, 2022 6:36 AM

Hello @Steph_Georges,

If you want to aggregate all the DoE's data you have collected on this equipment, but record the potential changes in machine settings (between DoE/time periods), it may be possible to introduce a new column like "change" or "setting" (categorical by default, unless you have more information to characterize more precisely the changes in settings through numerical continuous values), and for each DoE or time period, indicate in which condition it was run.
Example : DoEs 1 to 14 at setting "A" (initial equipment settings), then DoEs 15 to 20 at "B", 20 to 24 at "C"...

I see two options for modelization/analysis depending on how you want to set this "change" parameter in your model (either as a fixed effect or random effect) :
- If you have only limited changes that are reproducible (and no other future settings changes possible), you can treat this new column as a fixed effect in your analysis. Then you can analyze it through "Fit Least Squares", "Stepwise" or "Generalized Regression" (JMP Pro) with your other effects, and you'll have estimates about each level of possibles changes you have met, so you'll be able to quantify quite precisely the impact of changes on the response(s) (how a change setting from A to B will increase/decrease the response(s) for example).
- If these changes are only a small part of possibles (and future) changes on the equipment, and you are therefore more interested in knowing if a change in equipment setting has created a change in variance, you can treat this new column as a random effect and analyze your results through a "Mixed Model" or "Generalized Mixed Model" (JMP Pro) by indicating that this variable is a random effect. Then, JMP will provide you an REML analysis (variance analysis to know if this random effect is significant, that is checking if there is a significant change in variance when settings are changed).

There was an interesting discussion about fixed vs. random effects here : Random vs Fixed Blocking Factor in DOE

This could be a first approach in analysis before going into more complex modeling if/when needed.

I hope this will bring you new idea for analysis

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Steph_Georges · Apr 24, 2022 03:09 PM

Hi Victor,
I had originally introduced several columns because I did not think that one column could represent the situation due to the cumulative nature of the effects. Nevertheless, I proceeded as advised (with your notations):
DoE_01 A <= initial situation
DoE_02 A
...
DoE_10 A
DoE_11 A+B <= B is added (effect that I consider limited but significant)
DoE_12 A+B
...
DoE_16 A+B+C <= C is added to A and B (same remark)
...
I introduced the new column as a fixed effect and proceeded, for the time being (I need to review in detail your other suggestions), with a simple analysis using "Fit Least Squares". With this technique, it is not directly the values of each modality that are of interest, but the magnitudes of the changes between A and (A+B) (to get the effect of B), between (A+B) and (A+B+C) (to get the effect of C), etc. Of course, if there are interactions between the events, this technique does not work (I consider not to have any as a first approximation).
For now, the profiler shows consistent results:
- the + or - changes in the event column on the response are at least physically explainable
- the parameters X1, X2, etc which govern the process, corrected for these small effects, are also more consistent.
This is excellent news! ... even if, I don't know, but there is something that bothers me about the cumulative nature of the effects. I can't quite put my finger on it, but I'm not sure I'm taking it into account correctly in this way.
Nevertheless, many thanks for your help.

Victor_G · Apr 25, 2022 07:54 AM

Hi @Steph_Georges,

Great if this solution can help you figure out the possible impacts of events on your different responses.

The third idea from @statman is also very interesting : if you analyze and create a model for each subset of your entire DoE dataset realized in the same conditions (analyze responses done with the same setting A, then create a model for setting A+B, then for setting A+B+C...), have your factors the same or similar impact on your response depending on the settings ? Are the estimates "close enough" or do you see a change in magnitude and/or sign (opposite effects depending on the setting for example ?) in the estimates ?
You could compare each of your models with the "big" model with fixed effect "setting", to see how the two types of models behave, and what makes physically more sense to you.
And last but not least, you can also try to model the "change of setting" as a random effect, that you can compare with the two previous type of models, that should give you a good modeling overview of what's possible as a first data analysis/discovery.

Have fun !

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

statman · Apr 24, 2022 09:24 AM

My thoughts...

1. Why were the changes done to the machines? Did your previous "numerous" experiments indicate these changes should be made? Can the changes be undone? Did the changes have a significant enough effect to change the "system"?

2. If these changes do have an effect, then the relationships of your other factors, developed before the changes, may be completely different. If there is any interaction of the deign factors with the changes made, then using the data prior to the changes may not be useful (unless you are willing to change back). In essence, you may have changed the inference space and the previous relationships may not be useful at all. In which case, move on...

3. I might try creating a column coding the changes over the time sequence (1, 2, 3...) and doing fit model by that column. Are the factor effects consistent over those changes?

"All models are wrong, some are useful" G.E.P. Box

Steph_Georges · Apr 25, 2022 10:40 AM

Hi statman,

Thanks a lot for your answer.

1. Concerning the changes, they were necessary from the mechanical point of view to improve the welding machine. For my greatest misfortune :-), they cannot be undone. For some changes, it is reasonable to think that they affect the process slightly. The challenge is to quantify this "slightly".

2. I fully agree with you. Unfortunately, all these tests are expensive (in time and energy) and it is not reasonable to think about doing them again. I try to do the best I can with the available data, trying to erase as much as possible the undesirable effects in order to obtain a "wrong but useful" model.

3. I have to try your suggestion. Thanks a lot for the idea