Re: Weighting Variables in regression models

dross111E · Jun 8, 2023 5:18 PM

I was wondering how to weight variables in a regression model. I have some variables that may contribute more to my model and I want to find a way to have JMP value these variables more so than other variables. I know I can enter them first into a stepwise regression model. I want to do that, but I also want to weight some variables more than others.

Mark_Bailey · Jul 14, 2020 06:58 PM

Weighting usually applies to observations, not variables, in regression. The Generalized Regression platform in JMP Pro offers several penalized regression techniques (penalty is like an inverse weight) the shrink the parameter estimate for each effect. The weight is based on the data, though. It is not provided as another variable or user input.

What is the basis for your determination of the weight of a variable? How did you determine that some variables have more weight and how much more weight? What is the form or nature of the weight for a variable?

dross111E · Jul 16, 2020 04:49 PM

We have brain volume measurements on 80 normal control subjects and 54 patients with traumatic brain injury (TBI). We are trying to create a model that will predict group membership (normal vs. TBI) based on brain volume measurements.

In our previous research, we found that our group of TBI patients had multiple brain volume abnormalities. So we wanted to use the effects sizes from that study to weight the brain volume variables for our predictive model, weighting variables proportionally to their effect sizes.

Is there a way to do that? Do you think it would help, above and beyond what the predictive models already do?

Mark_Bailey · Jul 17, 2020 09:52 AM

You are using 'linear regression' to model the response and predict the outcome. The linear predictor is a linear combination. The parameters or coefficients are weights in the sum or accumulation already. They are based on the empirical evidence provided by the data (observations). You might want to center and scale the predictor variables if they occur on very different scales (i.e., magnitude).

The fact that you have more than one predictor that change in concert with the patient outcome means that you have a collinearity problem, which inflates the variance of the parameter estimates. The estimates are unstable and may vary dramatically with even a small change in the data (e.g., one observation or new data.) Centering and scaling the predictors helps but might not eliminate the problem. You might choose to use one dominant predictor to avoid this problem. If the information in these predictors is not entirely redundant, then you might consider using principle components analysis to synthesize new and independent predictors, which might also improve interpretation.depending on the loading of the original variables on the new predictors.

Mark_Bailey · Jul 17, 2020 6:56 AM

Also, there are many other types of models for prediction. Two methods that I suggest are recursive partitioning and neural networks. The former is especially good for interpretation and the latter is especially good for accuracy.

There are others available in JMP Pro that can significantly lift the performance of prediction. Please see Help > JMP Documentation Library > Predictive Modeling guide.

statman · Jul 17, 2020 10:36 AM

Interesting study. In addition to @Mark_Bailey's points, I have some questions for you:

It appears you are trying to predict which category the subject will fall into (normal or TBI) based on brain volume.

Is there any way to make the categories more continuous or at least ordinal?
What do you mean by "group membership"? What metrics are used to do this categorization or is it a subjective judgement? Again have you considered measurement system errors?
How do you measure brain volume and have you considered the measurement system errors?
What are brain volume abnormalities? What measurements were used to determine these? Are there other measures that could be used instead of brain volume (e.g., density, dimensions, weight, EEG)
What are the "brain volume variables"?
What do the predictive models already do? What are those models?

As already suggested, the model will have parameters associated with each variable. These parameters are already "weighted" in terms of there effect on the response variables.

"All models are wrong, some are useful" G.E.P. Box

Weighting Variables in regression models