cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
Julianveda
Level III

Difference between "least square" and "generelized linear method" in the fit model

 

Hello Community,

 

There is something that I have not understood between the difference of the results I get when analyzing a small dataset by using fit model with “standard least squares” and then using the Box Cox transformation menu to do a log transformation (λ = 0) versus using the same “fit model” but this time with “generalized linear model”  with distribution “Normal” and link function “log”.

 

See the results in the table below. I preent the actual response values (clumn output variables), then a second colulmn with the prediction from the method least squares with the log transformation and finally a third column with the results using the method generalized linear model with normal distribution and log link function. You can clearly see that generalized linear model method does better predictions. I wonder why the least squares method with the log box cox transformation do not give as good results (or similar) as the ones by the generalize linear method with normal distribution and Log link function.

 

Julianveda_3-1685688335930.png

 

I specify that the terms of the model in both cases are the same (two main effects and one interaction). Knowing what the difference between these two methods is and if I can decide using one or the other without restrictions is crucial since if you se the results below the significance of terms are also different in each case. To be honest, the generalized linear method reflects more accurately the real situation we observe. However, I do not exactly know if I can use freely the generalized linear method. I looked in the documentation but found only general information about some nonnormal cases (binomial count etc) where generalized method can be used, but my question is a little bit more precised.

 

Effect summary for least squares method with Log box cox transformation (λ = 0)

Julianveda_4-1685688335931.png

 

Effect summary for Generalized linear method with normal distribution and log link function

Julianveda_5-1685688335931.png

 

I provide here below the whole table I you wish to verify/test the method:

 

Julianveda_6-1685688335932.png

 

 

Thank you for reading and I can provide further information if needed.

11 REPLIES 11
Victor_G
Super User

Re: Difference between "least square" and "generelized linear method" in the fit model

Hi @Julianveda,

 

Welcome in the Community !

 

Transforming the response with log is indeed not the same as using a GLM with log link. It's like comparing the average of the log response, versus the log of the average response.

Applying a non-linear (e.g., log, inverse) transformation to the dependent variables not only normalizes the residuals, but also distorts the ratio scale properties of measured variables.

On your example, we can see that using log transformation with a standard least squares model tends to underperform for bigger Y values, as differences in big Y values lead to very small log differences (it "shrinks" the differences because of the log transformation).

 

 

Victor_G_2-1685710492384.png

 

Example with rows 1 and 7 (output difference is equal to 30,43), where the difference of the log of the individual responses is equal to 0,237 whereas the log of the difference between the row is 1,483.

  

Applying GLM and setting up this type of model with link function enable to stay in the original scale of the data, using a link function to transform the mean into a linear function of the predictor variables and a variance function to allow for variance heterogeneity in the analysis rather than trying to transform it away (for example through log transform).

 

I added the datatable and scripts used for the comparison, and if other experienced users want to use the dataset for further explanations.

 

Some references for further explanations/reading :

  1. https://stats.stackexchange.com/questions/47840/linear-model-with-log-transformed-response-vs-genera...
  2. http://faculty.washington.edu/heagerty/Courses/b571/homework/Lindsey-Jones-1998.pdf
  3. http://www.leg.ufpr.br/~joel/Rmodelling/Slides/transforms.pdf
  4. https://www.frontiersin.org/articles/10.3389/fpsyg.2015.01171/full

@Mark_Bailey you can use the dataset I attached, it shows some patterns in the actual vs. predicted and residuals :

Victor_G_0-1685711008971.png

Victor_G_1-1685711028961.png

 

I hope you'll better understand the difference between the two modeling techniques.

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics
Julianveda
Level III

Re: Difference between "least square" and "generelized linear method" in the fit model

Thank you @Victor_G  for this very interesting and insightful information.

 

To be honest I do not understand everything in the provided links, but if I understood well the use of GLM can be justified in my case.

 

Please correct me if I'm wrong.

 

Julian