In this paper, we will review the most prominent methods for estimating parameters of the Poisson regression model when data suffers from a semi-multicollinearity problem, such as Ridge regression and Liu estimator's method. Estimation methods were applied to real data obtained from Central Child Hospital in Baghdad, representing the number of cases of congenital defects of children in the heart and circulatory system for the period from 2012-2019; The results showed the superiority of the Liu estimators' method over the ridge regression method based on (AIC) as a criterion for comparison.

Keywords : Poisson regression, Liu estimators, Multicollinearity problem.

Hello everyone,

my name is Raaed Fadhil Mohammed.

I am a statistician. I lecturer in University of Mustansiriyah.

My paper title is Estimating the Parameter of Poisson Regression Model Under

the Multicollinearity Problem .

Outline: Poisson Regression Model, Multicollinearity problem,

Ridge Regression Estimator Method,

Liu Estimators Method, and Real Data Example.

Conclusion and References.

Poisson Regression Model.

One of the types of regression model that fall under linear-logarithmic

regression model as by taking the natural logarithm

of the distribution formula, it turns into a linear procedure.

Random errors in the model follow a Poisson distribution with a parameter

mu.

The model is based on two essential assumptions about the distribution

as it differs from the distribution of random errors in the linear regression

model and the properties of the Poisson distribution parameter mu as a function

of predictor variables.

M ulticollinearity Problem.

Multicollinearity Problems occur when two or more predictor variables are correlated

to a solid linear relation, so it's difficult to separate the effect

of each predictor variable from the dependent variable

in practice.

Or when the value of one of the predictor variables depends on one or more

of the other predictor variables in the model under study,

as well as if the data takes the form of a time series or across- section data.

The multicollinearity problem can be classified into two types:

Number 1, Perfect multicollinearity.

The determinant of the information matrix is zero, x transpose x determined equal

zero.

It follows from this is impossible to estimate,

the parameters of the regression model due to the inability to calculate

the inverse of matrix x transpose x.

The best method in this case to calculate x transpose x. We can make use

principal component analysis.

Number 2, Semi-perfect multicollinearity.

In this case, if the value of the determinant information matrix

is minimal, close to zero,

then the parameters estimated considerable variance.

The best method in this case we can use regression method or Leo

estimator method.

The following formula here can express the variance- covariance matrix

of the parameters estimated.

Perhaps the best statistical method for measuring the multicollinearity

intensity is the variance in flation factor VIF, whose formula is as follows.

VIF equal one over one minus R square.

R square here determined coefficient.

Ridge Regression Estimator Method.

One of the important alternatives for estimating the parameters

of regression module when there is m ulticollinearity between predictor

variables.

This method established according to the principle of the researchers Hoerl

and Kennard, which is by adding a small positive quantity to the mine diameter

elements of the information matrix.

The regression estimators are based when k greater than zero so that the base

amount can be expressed by the formula, Z minus identity by beta.

Liu Estimators method.

The researcher Liu 1993 laid the foundations of this method to address

the issue of the variance inflation of the estimated parameters

in the presence of multicollinearity a problem.

The Liu estimator for the parameter Poisson regression can be expressed

in the following formula.

Also Liu estimators are biased when d greater than zero and the magnitude

of the bias is z minus identity by beta.

Liu estimators are biased, the reason of the bias is the added value

d, which ranges between zero and one.

Also, the calculated mean squared error according to Liu estimators' method

is less than the mean squared error for the same parameters if estimated

according to the maximum likelihood method.

Real Data example.

We will obtain real data concerning congenital defects of the heart

and circulatory system in a new borns from the Central Child T eaching Hospital

in Baghdad, Iraq, where the distribution of a dependent

variable y represents abnormalities of the heart and circulatory system

in children was studied.

Also the revealing existence of a multicollinearity problem among

the predictor variables under study.

The case of congenital disabilities arriving at the Central Child Teaching

Hospital are recorded in a form prepared by the Statistics Division in the hospital

in the form of count data and totals within semi monthly periods,

the sample was taken for the period from 2012 to end 2019,

and a Poisson regression model was built as one of the appropriate models

to describe this phenomenon as the following formula:

yi equ al exponential beta one xi 1 plus beta 2 xi 2 plus beta 3 xi 3 plus beta

4 xi 4 plus beta 5 xi 5 plus beta 6 xi 6 plus beta 7 xi plus ui.

That y represent the total number of children with congenital heart

and circulatory defects in each period.

Xi1, the total weighted of infected children within each period.

Xi2, the total ages fathers of inflected children within each period.

Xi3, the total ages mothers of inflected children within each period.

Xi 4, represents the number of infected male children within each period.

Xi5 represents the number of inected female children within each.

Xi 6, the number of infected children born from consanguineous marriages

within each period.

Xi7, the number of infected children whose mothers were exposed to radiation

or life influence such as taking certain medications and drugs during pregnancy.

Beta one, beta two, beta three, beta four, beta five, beta six and beta seven beta.

The slope parameters in the model and beta note represents

the constant term.

ui represent the random error in the model.

This table, Testing Data and Diagnoising Multicollinearity to find out probability

distribution according to which response variable can be distributed.

We use jump pro 16.2 and it was found y, dependent variable follows the Poisson

distribution with distributed parameter, mu equals 6.5.

To verify, the suitability of the Poisson

distribution to the response variable y.

The testing goodness of it was conducted for the variable of the total

number of children with congenital heart and circulatory defense in each period.

The throw which we make show that the poison distribution is the most

promoted distribution that dependant variable can follow.

Where we not have the goodness of test value is 4958.4579 with significance

level close to zero.

As on the table in the front of you.

To detect whether there is multicolliniarity among the product

variable under study,

we can calculate the correlation matrix between the predictor variables.

From the figure we observe that value of correlation coefficient are significant

and large for all predictor variables, as each variable is associated with all

predictor variables, with the strong direct linear correlation.

This table shown the value of various function factors.

As the largest of them, well those of the projector variables, X2, X3 and X4

The variance inflation factor for undermining projector variables

exists the number 18.

From this we conclude that there is a linear multiplicate between

the predictor variables and the [inaudible 00:16:24].

Application of Poisson regression method.

Parameter estimator of Poisson regression model using method regression,

we observe that the total number of children with congenital health,

and circulatory defects in each period depends on the increase and all

parameters of [inaudible 00:17:11].

However, most variables are insignificant. X1, X4,

X5 and X6

because of the effects of semi- perfect multi collinearity.

Also, the result indicates that the base parameter is k equal zero

point one, two.

[foreign language 00:17:53]

86.4959.

This table we can obtain by using JMP.

When we are applying the Liu estimator method to estimate the coefficient

of the Poisson regression module in the presence of the multicollinearity

problem.

We use the JMP secret to connect between our language and JMP.

This secret to connect and run from JMP,

the package of Liu regression in our language.

While applying to [inaudible 00:19:12], the coefficient, we obtained this result.

We observed that the total number of children with the congenital hearts

and circulatory in each period depends on the extent

of increase in all parameters of the model,

despite the insignificance of the variable xi 1.

Because all variables under study are increasing the number of children

with congenital disabilities, but it is very good properties.

Also the result indicates

that Hiaki formation criteria 35. 29

add the base parameter d equal 4.1.

When comparing the two methods regression and Liu estimator,

we know that the estimator approach given a low value of information.

Has more significant proficient when they compare to the regrasion method.

Conclusions.

In this paper, we review the most prominent method of parameter estimating

of the Poisson regression model when the data suffer from the problem

of semi- perfect multicollinearity, where took the ridge regression method

and Liu estimators' method and compared the two methods based

on Account Information Criteria as a criterion for comparison.

By applying regression analysis method in the presence of a semi-perfect

multicollnearity problem to real data regarding congenital heart

and circulatory defects in newborns, obtained from the Central Child Teaching

Hospital for the period from 2012 to 2019,

we find the Liu estimator's method is more efficient than the regression

method because it has a lower Akaike's information criterion.

It also gives more reliable results and more accurate p-values.

Number 3: Through Liu estimators' method, it is clear that all predictor variables

under study are influential in the regression model,

even if they are not significant, as all parameters are influential

in increasing the number of children with congenital disabilities but

in varying proportions.

Thank you.

Published on ‎05-20-2024 07:53 AM by | Updated on ‎07-23-2025 11:14 AM

In this paper, we will review the most prominent methods for estimating parameters of the Poisson regression model when data suffers from a semi-multicollinearity problem, such as Ridge regression and Liu estimator's method. Estimation methods were applied to real data obtained from Central Child Hospital in Baghdad, representing the number of cases of congenital defects of children in the heart and circulatory system for the period from 2012-2019; The results showed the superiority of the Liu estimators' method over the ridge regression method based on (AIC) as a criterion for comparison.

Keywords : Poisson regression, Liu estimators, Multicollinearity problem.

Hello everyone,

my name is Raaed Fadhil Mohammed.

I am a statistician. I lecturer in University of Mustansiriyah.

My paper title is Estimating the Parameter of Poisson Regression Model Under

the Multicollinearity Problem .

Outline: Poisson Regression Model, Multicollinearity problem,

Ridge Regression Estimator Method,

Liu Estimators Method, and Real Data Example.

Conclusion and References.

Poisson Regression Model.

One of the types of regression model that fall under linear-logarithmic

regression model as by taking the natural logarithm

of the distribution formula, it turns into a linear procedure.

Random errors in the model follow a Poisson distribution with a parameter

mu.

The model is based on two essential assumptions about the distribution

as it differs from the distribution of random errors in the linear regression

model and the properties of the Poisson distribution parameter mu as a function

of predictor variables.

M ulticollinearity Problem.

Multicollinearity Problems occur when two or more predictor variables are correlated

to a solid linear relation, so it's difficult to separate the effect

of each predictor variable from the dependent variable

in practice.

Or when the value of one of the predictor variables depends on one or more

of the other predictor variables in the model under study,

as well as if the data takes the form of a time series or across- section data.

The multicollinearity problem can be classified into two types:

Number 1, Perfect multicollinearity.

The determinant of the information matrix is zero, x transpose x determined equal

zero.

It follows from this is impossible to estimate,

the parameters of the regression model due to the inability to calculate

the inverse of matrix x transpose x.

The best method in this case to calculate x transpose x. We can make use

principal component analysis.

Number 2, Semi-perfect multicollinearity.

In this case, if the value of the determinant information matrix

is minimal, close to zero,

then the parameters estimated considerable variance.

The best method in this case we can use regression method or Leo

estimator method.

The following formula here can express the variance- covariance matrix

of the parameters estimated.

Perhaps the best statistical method for measuring the multicollinearity

intensity is the variance in flation factor VIF, whose formula is as follows.

VIF equal one over one minus R square.

R square here determined coefficient.

Ridge Regression Estimator Method.

One of the important alternatives for estimating the parameters

of regression module when there is m ulticollinearity between predictor

variables.

This method established according to the principle of the researchers Hoerl

and Kennard, which is by adding a small positive quantity to the mine diameter

elements of the information matrix.

The regression estimators are based when k greater than zero so that the base

amount can be expressed by the formula, Z minus identity by beta.

Liu Estimators method.

The researcher Liu 1993 laid the foundations of this method to address

the issue of the variance inflation of the estimated parameters

in the presence of multicollinearity a problem.

The Liu estimator for the parameter Poisson regression can be expressed

in the following formula.

Also Liu estimators are biased when d greater than zero and the magnitude

of the bias is z minus identity by beta.

Liu estimators are biased, the reason of the bias is the added value

d, which ranges between zero and one.

Also, the calculated mean squared error according to Liu estimators' method

is less than the mean squared error for the same parameters if estimated

according to the maximum likelihood method.

Real Data example.

We will obtain real data concerning congenital defects of the heart

and circulatory system in a new borns from the Central Child T eaching Hospital

in Baghdad, Iraq, where the distribution of a dependent

variable y represents abnormalities of the heart and circulatory system

in children was studied.

Also the revealing existence of a multicollinearity problem among

the predictor variables under study.

The case of congenital disabilities arriving at the Central Child Teaching

Hospital are recorded in a form prepared by the Statistics Division in the hospital

in the form of count data and totals within semi monthly periods,

the sample was taken for the period from 2012 to end 2019,

and a Poisson regression model was built as one of the appropriate models

to describe this phenomenon as the following formula:

yi equ al exponential beta one xi 1 plus beta 2 xi 2 plus beta 3 xi 3 plus beta

4 xi 4 plus beta 5 xi 5 plus beta 6 xi 6 plus beta 7 xi plus ui.

That y represent the total number of children with congenital heart

and circulatory defects in each period.

Xi1, the total weighted of infected children within each period.

Xi2, the total ages fathers of inflected children within each period.

Xi3, the total ages mothers of inflected children within each period.

Xi 4, represents the number of infected male children within each period.

Xi5 represents the number of inected female children within each.

Xi 6, the number of infected children born from consanguineous marriages

within each period.

Xi7, the number of infected children whose mothers were exposed to radiation

or life influence such as taking certain medications and drugs during pregnancy.

Beta one, beta two, beta three, beta four, beta five, beta six and beta seven beta.

The slope parameters in the model and beta note represents

the constant term.

ui represent the random error in the model.

This table, Testing Data and Diagnoising Multicollinearity to find out probability

distribution according to which response variable can be distributed.

We use jump pro 16.2 and it was found y, dependent variable follows the Poisson

distribution with distributed parameter, mu equals 6.5.

To verify, the suitability of the Poisson

distribution to the response variable y.

The testing goodness of it was conducted for the variable of the total

number of children with congenital heart and circulatory defense in each period.

The throw which we make show that the poison distribution is the most

promoted distribution that dependant variable can follow.

Where we not have the goodness of test value is 4958.4579 with significance

level close to zero.

As on the table in the front of you.

To detect whether there is multicolliniarity among the product

variable under study,

we can calculate the correlation matrix between the predictor variables.

From the figure we observe that value of correlation coefficient are significant

and large for all predictor variables, as each variable is associated with all

predictor variables, with the strong direct linear correlation.

This table shown the value of various function factors.

As the largest of them, well those of the projector variables, X2, X3 and X4

The variance inflation factor for undermining projector variables

exists the number 18.

From this we conclude that there is a linear multiplicate between

the predictor variables and the [inaudible 00:16:24].

Application of Poisson regression method.

Parameter estimator of Poisson regression model using method regression,

we observe that the total number of children with congenital health,

and circulatory defects in each period depends on the increase and all

parameters of [inaudible 00:17:11].

However, most variables are insignificant. X1, X4,

X5 and X6

because of the effects of semi- perfect multi collinearity.

Also, the result indicates that the base parameter is k equal zero

point one, two.

[foreign language 00:17:53]

86.4959.

This table we can obtain by using JMP.

When we are applying the Liu estimator method to estimate the coefficient

of the Poisson regression module in the presence of the multicollinearity

problem.

We use the JMP secret to connect between our language and JMP.

This secret to connect and run from JMP,

the package of Liu regression in our language.

While applying to [inaudible 00:19:12], the coefficient, we obtained this result.

We observed that the total number of children with the congenital hearts

and circulatory in each period depends on the extent

of increase in all parameters of the model,

despite the insignificance of the variable xi 1.

Because all variables under study are increasing the number of children

with congenital disabilities, but it is very good properties.

Also the result indicates

that Hiaki formation criteria 35. 29

add the base parameter d equal 4.1.

When comparing the two methods regression and Liu estimator,

we know that the estimator approach given a low value of information.

Has more significant proficient when they compare to the regrasion method.

Conclusions.

In this paper, we review the most prominent method of parameter estimating

of the Poisson regression model when the data suffer from the problem

of semi- perfect multicollinearity, where took the ridge regression method

and Liu estimators' method and compared the two methods based

on Account Information Criteria as a criterion for comparison.

By applying regression analysis method in the presence of a semi-perfect

multicollnearity problem to real data regarding congenital heart

and circulatory defects in newborns, obtained from the Central Child Teaching

Hospital for the period from 2012 to 2019,

we find the Liu estimator's method is more efficient than the regression

method because it has a lower Akaike's information criterion.

It also gives more reliable results and more accurate p-values.

Number 3: Through Liu estimators' method, it is clear that all predictor variables

under study are influential in the regression model,

even if they are not significant, as all parameters are influential

in increasing the number of children with congenital disabilities but

in varying proportions.

Thank you.



0 Kudos