cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
JMP is taking Discovery online, April 16 and 18. Register today and join us for interactive sessions featuring popular presentation topics, networking, and discussions with the experts.
Choose Language Hide Translation Bar
Shad
Level II

How to analyze unbalanced data which three of independent factors are unbalanced levels?

Hello,

I have a data set which has four predictors: three of the predictors including time, distance, and power are unbalanced and one balance predictor. Do you have any idea what is the best way to analyze such data? And how I can analyze them? I do appreciate any help.

12 REPLIES 12
Peter_Bartell
Level VIII

Re: How to analyze unbalanced data which three of independent factors are unbalanced levels?

Every analysis 'how to' question should start with an articulation of the practical problem you are trying to answer. You haven't shared that with the Community. Please do.

 

Then a thorough evaluation of the means by which the data was collected. A designed experiment? Happenstance data? Historical data in time series? And what do you know about the measurement system and it's variability?

 

Are there are any issues associated with that process which make subsequent analysis problematic. Missingness, outliers, nonsense values. What about correlation of predictor variables? Then embark on analysis...at the highest level I have three thoughts for you:

 

1. Plot the data.

2. Plot The Data.

3. PLOT THE DATA...and this is where JMP shines.

Shad
Level II

Re: How to analyze unbalanced data which three of independent factors are unbalanced levels?

Hi Peter,

 

Thanks for your prompt response. Here is the problem articulation:

The experiment is a designed experiment. The objectives of this study were to determine the suitable combinations of IR heating duration, Gap distance, and intensity, followed by tempering treatments to maximize inactivation mold spores. I did plot the data and since it is not a balanced design there are some missing values. When I did the full factorial analysis the effect summary is not showing any values.

I was reading the same issue stated from other people: I found that I need to analyze using mixed model but I am not sure which variables should be considered as fixed and which as random?

 

BTW, how I can share my questions with the community now?

 

Re: How to analyze unbalanced data which three of independent factors are unbalanced levels?

Thanks for the screen shot @Shad,

 

Based on the image, you have 15 terms in the model. IF it is a DOE with 4 effects and a full factorial design, then that would mean you have 16 runs. Is that correct? If so, you would not have enough degrees of freedom to estimate any of the effects/terms in the model (not enough data). You might want to start with stepwise regression to add terms in a forward stepwise fashion. Otherwise, remove terms that are most likely not going to have an impact (4th degree polynomial for example).

 

Hope that helps.

 

Chris

Chris Kirchberg, M.S.2
Data Scientist, Life Sciences - Global Technical Enablement
JMP Statistical Discovery, LLC. - Denver, CO
Tel: +1-919-531-9927 ▪ Mobile: +1-303-378-7419 ▪ E-mail: chris.kirchberg@jmp.com
www.jmp.com

Re: How to analyze unbalanced data which three of independent factors are unbalanced levels?

Another possible approach rather than going forward is to specify a model with only main effects and two-way interactions. Remove the three-way and four-way interactions.

Dan Obermiller
Shad
Level II

Re: How to analyze unbalanced data which three of independent factors are unbalanced levels?

Thank you, Chris.

I have 4 main effects in total but each effect has different levels. For example, the time has 6 levels, Gap has 3 levels and so on. I attached a screen of my distribution analysis. Another thing is that when I did multiple regression, some variables are correlated. And also by doing full factorial, VIF showed high values. I am wondering are these VIF value true since I am not sure the full factorial is right? As I said some of the parameter estimates are missing.

 

Re: How to analyze unbalanced data which three of independent factors are unbalanced levels?

Thanks @Shad,

 

It looks like you have 78 runs total according to image.  Full Factoral DOE would need 108 runs for 4 categorical factors each with the levels you have specified (6x3x3x2).

 

I would follow @Dan_Obermiller advice and only analyze the main effects and two way interactions at this time. Otherwise you will not get any values for parameter estimates (not enought data to fit the model you are trying to fit due to all factors being categorical).

 

Chris

Chris Kirchberg, M.S.2
Data Scientist, Life Sciences - Global Technical Enablement
JMP Statistical Discovery, LLC. - Denver, CO
Tel: +1-919-531-9927 ▪ Mobile: +1-303-378-7419 ▪ E-mail: chris.kirchberg@jmp.com
www.jmp.com
Shad
Level II

Re: How to analyze unbalanced data which three of independent factors are unbalanced levels?

Dear Chris,

 

Thanks for the help and sorry for my delayed response. I had issues with my account access. 

if I just consider the main effects and two ways interactions, I still have some missing values. For example if you notice to my data distribution, not all levels of my predictors have the same data numbers. Because it was not possible to apply all the predictors combinition to my experiment. How about if I do mixed model, put the predictors with uneven values under random effects? Do you have any idea?

 

Best,

Re: How to analyze unbalanced data which three of independent factors are unbalanced levels?

Hi @Shad,

 

Ah, yes, some interactions will not work becuase there is no data. Another post had a simillar issues:

 

https://community.jmp.com/t5/Discussions/How-to-do-three-way-Full-Factorial-ANOVA-with-unbalanced-da...

 

Mixed Model is not needed, One would have to exclude those interactions that do not have data like what was done with the above post.  Looking at Distribution like you have to figure out which combinations do not have data should help wiht chosing which interactions to include in mixed model.

 

Does that help?

 

Chris

Chris Kirchberg, M.S.2
Data Scientist, Life Sciences - Global Technical Enablement
JMP Statistical Discovery, LLC. - Denver, CO
Tel: +1-919-531-9927 ▪ Mobile: +1-303-378-7419 ▪ E-mail: chris.kirchberg@jmp.com
www.jmp.com
Shad
Level II

Re: How to analyze unbalanced data which three of independent factors are unbalanced levels?

Thank you very much for the link.

In case if I remove the interactions that have missing values, I would have to remove most of interactions. Could you please suggest the best way of analysis if I am interested in knowing the interactions of  "intensity by time", "intensity by Gap" and "time by Gap"?

 

Appreciated again all your help.