What is multiple linear regression ?
Multiple linear regression was used to explore Multiple independent and dependent variables relationship between variables, and is used to simulate the relationship between reflected variables and two or more dependent variables.
use JMP conduct multiple linear regression
Take the data in Figure 1 as an example to explore the possible influencing factors of physical health scores.
figure 1
In JMP, click the "Analyze" menu → 「 Fit model ” (Figure 2), in the pop-up dialog box, import the physical health score into “Y " , import the variables expected to be subjected to multi-factor analysis into the "construct model effect" (Figure 3).
figure 2
image 3 「 The second step of the operation diagram of "Fitting the model"
In Figure 3, we can see that there are two option boxes on the upper right side of the dialog box, namely "Characteristics" and "Key Points".
- When the dependent variable is a continuous variable (At this time, the left side of the variable name is displayed as a blue triangle), "Characteristics" is automatically set to "Standard Least Squares Method", that is, Using linear regression method .
- 「 focus " The default is the effect leverage ratio. You can choose other values, which will not affect the results. , but the display content in the results is different, some display more, some display less, and you can adjust the content to be displayed in the results, so the "Key Points" option can be used by default.
02 Interpretation of input results
The default output results of linear regression include the results of several different parts. Only the important results are introduced next.
Part 1 Leverage chart
The first part is Leverage chart, the leverage chart is also called the partial regression residual leverage chart, or the added variable chart, which reflects the relationship between an independent variable and the dependent variable after deducting the influence of other factors.
In this example, the multifactor analysis included 6 independent variables, so the results show the partial residual plots of the 6 independent variables and the outcome after correcting the other 5 independent variables. Due to limited space, this article only presents the leverage diagram of the first three independent variables (Figure 4).
Figure 4 Leverage chart results (partial display)
Leverage plots are not the same for continuous variables and categorical variables.
- Continuous variables are easy to understand. For example, the lever chart of age shows that after adjusting for other factors, it is negatively correlated with the physical health score;
- The results for categorical variables can be obtained by least squares mean ” to observe the differences between two or more categories. As shown in the drinking table, after adjusting for other factors, the least square means of physical health scores for drinking and non-drinking are 48.1 and 45.4 respectively. This mean is not equivalent to the conventional mean.
Part 2 「 Predictive value - actual value " picture
The second part is the "predicted value-actual value" graph (Figure 5). This graph reflects the degree of agreement between the model's predicted values and the actual observed values. The abscissa is the predicted value and the ordinate is the actual value. The closer the two are, the closer they are. 45 The higher the degree line, the better the model fitting effect.
In this case R 2 =0.42, which can be considered a well-fitting regression model in the medical field.
Figure 5 「 Predicted value-actual value" graph
Part 3 Effect summary results
The third part is the effect summary results (Figure 6), This part reflects the contribution value of each variable, mainly through LogWorth value to reflect, LogWorth value equals -log 10 P .
As can be seen from Figure 6, the largest contributors among the six independent variables are cardiac function class and dyspnea, while age, BMI, drinking and smoking have smaller effects. This result can intuitively reflect the contribution of the independent variables, and can also be used as a reference for further variable screening.
Figure 6 Summary of effects
Part 4 「 Predictive value - residual " picture
The fourth part is the "predicted value-residual" graph (Figure 7). This graph is a scatter plot drawn with the predicted value of the dependent variable as the abscissa and the residual as the ordinate.
The residual plot can indicate whether the model fit is reasonable. If the residuals are randomly distributed in y=0 The upper and lower sides of , indicate that the model fits well. If it is a non-random distribution (such as showing a specific change trend), it indicates that there may be a problem with the model, and it is necessary to reconsider whether the prerequisites for model construction are met.
The results of this example show that the residual distribution is relatively uniform and no obvious change trend is found. It can be considered that the model fits the data well.
Figure 7 「 Predicted value-residual error plot
Part 5 parameter estimates
The fifth part is the parameter estimates (Figure 8), which The part is often the result that we are more concerned about. It gives the parameter estimate, standard error, etc. corresponding to each independent variable. t value sum P value (i.e. in the result 「 Probability >|t|』 ) .
For continuous variables, the parameter estimates are only 1 row of results; for categorical variables, the number of result rows presented is the number of categories - 1, showing the results compared with the reference category, such as "cardiac function classification" It is a 4-category, so 4-1=3 rows of results are displayed, which are the comparison results of cardiac function grades 1, 2, 3 and 4 (a category that does not appear in the results).
Figure 8 Parameter estimates
However, it should be noted that JMP The default parameter estimates in categorical variables are "effect codes", not dummy variable codes. In the general case, when we are looking at categorical variables, we would expect to see dummy variable encoding, in this case it is the opposite.
The difference between effect coding and dummy variable coding is that the dummy variable reflects The difference between other classes and the reference class ;The effect coding reflects The difference between each category and the mean . Therefore, when there are categorical variables in the independent variables, do not use the parameter estimate results, but choose the index parameterized estimate results.
In order to output the dummy variable coding that we are more accustomed to, you need to click "Estimate Value" on the red downward arrow to the left of the response "Physical Health Score" → 「 Indicator parameterized estimate” (Figure 9) .
Figure 9 「 Indicator parameterized estimated value "operation steps
At this time, the result will give the parameterized result of the indicator function. At this time, the parameter estimation result corresponding to the classification variable is the result we really want to get (Figure 10).
Figure 10 Index parameterized estimated values
The index function parameterization results in Figure 10 show (taking cardiac function class as an example) that compared with cardiac function class 4 (categories not shown in the results are reference classes), the physical health score of cardiac function class 1 is higher than that of cardiac function class 4. The average score is 36.29 points higher. Heart function class 2 is 22.22 points higher than heart function class 4. Heart function class 3 is 11.54 points higher than heart function class 4.
Comparing the results in Figure 10 and Figure 8, it can be seen that the continuous variables of the two results are the same, but the categorical variables are different. So again, If the independent variables have categorical variables, the indicator function parameterization result must be selected . The above is how to use JMP shared today Multifactor analysis with linear regression was performed.
>>> Watch the multiple linear regression tutorial video <<<
original: https://mp.weixin.qq.com/s/cfFuM9zrTgB_yr0pmgvoBQ
Recommended reading:
how to JMP Perform relevant analysis in (Correlation Analysis) ?
7 indivual Table Organizing tools, no need to use Excel Data cleaning can be completed
Is the larger the number of samples, the better? The number of samples and power Calculation of values
Understand the chi-square test: an analytical tool for comparison between groups of categorical data
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.