I have a very urgent question. Do I center a regressor before I exclude outliers from the dependent variable or after? It would change the mean of the regressor because the data points that are an outlier for the dependent variable, their value of the regressor wil not be included in the mean of the regressor that is used to center it. Is there a convention in what order it should be done?
I hope I put my question in words correctly.
I would apppreciate every hint very much.
Mathematically, it would make more sense to center on the data actually being used to fit the model (i.e. after removing the outlier). The point is to reduce collinearity with higher order terms in the model, which increases the power of the t-test for each model coefficient (similarly shortens the confidence intervals for the associated parameter estimates). The other potential reason is to have a specific interpretation for parameter estimates. Most likely, it would make very little difference to center the effects with or without the outlier since it probably won't change any correlations with the higher order terms a significant amount.
Curious though, are you centering them manually? If so, you really shouldn't need to because JMP handles that for you. In Fit Model, the default option is to center polynomials.
Thanks for your answer. I do cener them manually, how do I get jmp to do it when using fit model? I use the linear mixed regression model like an Anova. That's why I wil split my data in a second step to solve an interaction. Would I center my regressor for the to sperate groups? I feel like this would be unauthorized but mathematically reasonable.
You don't really need to do anything. If you want to make sure you have that option turned on, just look at red-arrow menu in the dialog window for Fit Model. You should see "Center Polynomials" is checked. The main effects will not be centered, but they will be in the interactions and quadratic terms. That's all you need to get the benefit of centering.
I'm not sure I understand what you mean "linear mixed regression". Do you mean you have continuous and categorical factors? I ask because that might also be understood to mean a mixture of fixed and random effects. Either way, I don't understand the need to split the data to solve an interaction. Can you explain that a bit more? You should be able to model any interactions directly all in 1 model.
I split the data afterwards copmaparble to pairwise comparisions to show how the main effect wors for the one subgroup and for the other subgroup. I have got a 2x2 design. Another question I would have is I need to find the best transformation. Box Cox test is not very useful in jmp as it is only useable for positive data (in Fit Model standard least squares) but I have both negative and positive values. I would now compare log, squarert, and reciprocal (after I added 1+minimum) to the data. I already tried Box-Cox with positivized data but as I get a different value every time I add a different constant to my data I feel like this is not very reasonable.
Can you post the data you are using? If you have intellectual property concerns, you could give the columns generic names. I think it would be much easier to help you if I could see what you are working with.
I attached the file. I want to analyse key_resp_1.rt von Teilmenge von Data_total by kongruenz=1 and neg2=0 and its interaction. rating zentr is the regressor I asked about for centering.
The question about data transformations is for a second analysis of rating_val1.response as the dependent variable with the same fixed effects. the same for rating_2.response.
I wanted to do a reciprocal transformation but this is bad because of the zero values right? after adding a constant the effects change completely. I would now compare log and squrt. My supervisor told me that quadratic transformations shouldnt be used when having negative values. would that hold true for the squrt, too when I added a constant first?
I am really truly thankful for you support.
Thanks for posting that. It clears up a lot of confusion. Centering is not going to do anything for you in this case since you are not asking to model any higher order effect that involve rating.response.
If you did include interaction with rating.response, you'd see the interactions with that effect are centered by the mean rating:
I'm sure you noticed a high degree of non-normality for that response. Based on the Box-Cox plot, I would recommend a power transform using the value -0.5 (reciprocal of square root). There's also a problem with independence: a very obvious trend in the residuals following the row order (does row order correspond to time order here?). Not sure what is going on there, but that is a big question that ought to be addressed. This residual plot shows the problem with the transform, but you can see it before the transform as well.
If you add a row-order column as an additional effect, you can account for that (still leaves some major questions). However, the row order is literally the only effect that is statistically signficant.
Now, for your second question. You technically have ordinal data for the response. With a 7-point scale, you can sometimes get away with treating as continuous, but there's probably not a good transform you can do. The more technically correct way to analyze that response would be ordinal logistic regression. To do that, make the modeling type for those responses "Ordinal". Fit Model should default to an ordinal logistic regression. You can specify the same model for your fixed effects. This model will output a probability of each rating value.
I saved the scripts I used to the table.
Thanks so much for your time and your detailed response. I want to fit a linear mixed model with both participant and itemnumber as random effects (only intercepts). I cannot explain the trend with the row number, maybe it is because it is sorted by keeyresp at the moment? My supervisor recommended me to do the LMM for the ordinal data, even though technically it is ordinal. I now calculated all models for resp.val with the different transformations and saved the residuals. As all do not follow a normal distribution I would use log as there the normal distribution is at leas at the third place of recommendations. Or is there a way to calculate Box Cox without adding a constant to my values so that they become positive?
Unfortunately I cannot attach the file.
Ok, the random effects were what I was missing. You wouldn't be able to incorporate those into an oridinal logistic regression anyway using JMP. You could do that with SAS (PROC GLIMMIX) or in R. I'm sure there are software implementation of Box-Cox with an off-set parameter, but JMP doesn't do that. I would just add 4 to those if you want Box-Cox just to see if a transform would even help. It actually kind of makes sense to do that with this data because a 7-point Likert scale from 1 to 7 makes just as much a sense as -3 to 3. I looked at the ordinary least squares models of those responses, and the non-normality is really not that bad plotting the studentized residuals on a normal quantile plot. I wouldn't do anything as far as transforms and just use the responses as they are.
There are no labels assigned to this post.