Hello everyone,
my name is Raaed Fadhil Mohammed.
I am a statistician. I lecturer in University of Mustansiriyah.
My paper title is Estimating the Parameter of Poisson Regression Model Under
the Multicollinearity Problem .
Outline: Poisson Regression Model, Multicollinearity problem,
Ridge Regression Estimator Method,
Liu Estimators Method, and Real Data Example.
Conclusion and References.
Poisson Regression Model.
One of the types of regression model that fall under linear-logarithmic
regression model as by taking the natural logarithm
of the distribution formula, it turns into a linear procedure.
Random errors in the model follow a Poisson distribution with a parameter
mu.
The model is based on two essential assumptions about the distribution
as it differs from the distribution of random errors in the linear regression
model and the properties of the Poisson distribution parameter mu as a function
of predictor variables.
M ulticollinearity Problem.
Multicollinearity Problems occur when two or more predictor variables are correlated
to a solid linear relation, so it's difficult to separate the effect
of each predictor variable from the dependent variable
in practice.
Or when the value of one of the predictor variables depends on one or more
of the other predictor variables in the model under study,
as well as if the data takes the form of a time series or across- section data.
The multicollinearity problem can be classified into two types:
Number 1, Perfect multicollinearity.
The determinant of the information matrix is zero, x transpose x determined equal
zero.
It follows from this is impossible to estimate,
the parameters of the regression model due to the inability to calculate
the inverse of matrix x transpose x.
The best method in this case to calculate x transpose x. We can make use
principal component analysis.
Number 2, Semi-perfect multicollinearity.
In this case, if the value of the determinant information matrix
is minimal, close to zero,
then the parameters estimated considerable variance.
The best method in this case we can use regression method or Leo
estimator method.
The following formula here can express the variance- covariance matrix
of the parameters estimated.
Perhaps the best statistical method for measuring the multicollinearity
intensity is the variance in flation factor VIF, whose formula is as follows.
VIF equal one over one minus R square.
R square here determined coefficient.
Ridge Regression Estimator Method.
One of the important alternatives for estimating the parameters
of regression module when there is m ulticollinearity between predictor
variables.
This method established according to the principle of the researchers Hoerl
and Kennard, which is by adding a small positive quantity to the mine diameter
elements of the information matrix.
The regression estimators are based when k greater than zero so that the base
amount can be expressed by the formula, Z minus identity by beta.
Liu Estimators method.
The researcher Liu 1993 laid the foundations of this method to address
the issue of the variance inflation of the estimated parameters
in the presence of multicollinearity a problem.
The Liu estimator for the parameter Poisson regression can be expressed
in the following formula.
Also Liu estimators are biased when d greater than zero and the magnitude
of the bias is z minus identity by beta.
Liu estimators are biased, the reason of the bias is the added value
d, which ranges between zero and one.
Also, the calculated mean squared error according to Liu estimators' method
is less than the mean squared error for the same parameters if estimated
according to the maximum likelihood method.
Real Data example.
We will obtain real data concerning congenital defects of the heart
and circulatory system in a new borns from the Central Child T eaching Hospital
in Baghdad, Iraq, where the distribution of a dependent
variable y represents abnormalities of the heart and circulatory system
in children was studied.
Also the revealing existence of a multicollinearity problem among
the predictor variables under study.
The case of congenital disabilities arriving at the Central Child Teaching
Hospital are recorded in a form prepared by the Statistics Division in the hospital
in the form of count data and totals within semi monthly periods,
the sample was taken for the period from 2012 to end 2019,
and a Poisson regression model was built as one of the appropriate models
to describe this phenomenon as the following formula:
yi equ al exponential beta one xi 1 plus beta 2 xi 2 plus beta 3 xi 3 plus beta
4 xi 4 plus beta 5 xi 5 plus beta 6 xi 6 plus beta 7 xi plus ui.
That y represent the total number of children with congenital heart
and circulatory defects in each period.
Xi1, the total weighted of infected children within each period.
Xi2, the total ages fathers of inflected children within each period.
Xi3, the total ages mothers of inflected children within each period.
Xi 4, represents the number of infected male children within each period.
Xi5 represents the number of inected female children within each.
Xi 6, the number of infected children born from consanguineous marriages
within each period.
Xi7, the number of infected children whose mothers were exposed to radiation
or life influence such as taking certain medications and drugs during pregnancy.
Beta one, beta two, beta three, beta four, beta five, beta six and beta seven beta.
The slope parameters in the model and beta note represents
the constant term.
ui represent the random error in the model.
This table, Testing Data and Diagnoising Multicollinearity to find out probability
distribution according to which response variable can be distributed.
We use jump pro 16.2 and it was found y, dependent variable follows the Poisson
distribution with distributed parameter, mu equals 6.5.
To verify, the suitability of the Poisson
distribution to the response variable y.
The testing goodness of it was conducted for the variable of the total
number of children with congenital heart and circulatory defense in each period.
The throw which we make show that the poison distribution is the most
promoted distribution that dependant variable can follow.
Where we not have the goodness of test value is 4958.4579 with significance
level close to zero.
As on the table in the front of you.
To detect whether there is multicolliniarity among the product
variable under study,
we can calculate the correlation matrix between the predictor variables.
From the figure we observe that value of correlation coefficient are significant
and large for all predictor variables, as each variable is associated with all
predictor variables, with the strong direct linear correlation.
This table shown the value of various function factors.
As the largest of them, well those of the projector variables, X2, X3 and X4
The variance inflation factor for undermining projector variables
exists the number 18.
From this we conclude that there is a linear multiplicate between
the predictor variables and the [inaudible 00:16:24].
Application of Poisson regression method.
Parameter estimator of Poisson regression model using method regression,
we observe that the total number of children with congenital health,
and circulatory defects in each period depends on the increase and all
parameters of [inaudible 00:17:11].
However, most variables are insignificant. X1, X4,
X5 and X6
because of the effects of semi- perfect multi collinearity.
Also, the result indicates that the base parameter is k equal zero
point one, two.
[foreign language 00:17:53]
86.4959.
This table we can obtain by using JMP.
When we are applying the Liu estimator method to estimate the coefficient
of the Poisson regression module in the presence of the multicollinearity
problem.
We use the JMP secret to connect between our language and JMP.
This secret to connect and run from JMP,
the package of Liu regression in our language.
While applying to [inaudible 00:19:12], the coefficient, we obtained this result.
We observed that the total number of children with the congenital hearts
and circulatory in each period depends on the extent
of increase in all parameters of the model,
despite the insignificance of the variable xi 1.
Because all variables under study are increasing the number of children
with congenital disabilities, but it is very good properties.
Also the result indicates
that Hiaki formation criteria 35. 29
add the base parameter d equal 4.1.
When comparing the two methods regression and Liu estimator,
we know that the estimator approach given a low value of information.
Has more significant proficient when they compare to the regrasion method.
Conclusions.
In this paper, we review the most prominent method of parameter estimating
of the Poisson regression model when the data suffer from the problem
of semi- perfect multicollinearity, where took the ridge regression method
and Liu estimators' method and compared the two methods based
on Account Information Criteria as a criterion for comparison.
By applying regression analysis method in the presence of a semi-perfect
multicollnearity problem to real data regarding congenital heart
and circulatory defects in newborns, obtained from the Central Child Teaching
Hospital for the period from 2012 to 2019,
we find the Liu estimator's method is more efficient than the regression
method because it has a lower Akaike's information criterion.
It also gives more reliable results and more accurate p-values.
Number 3: Through Liu estimators' method, it is clear that all predictor variables
under study are influential in the regression model,
even if they are not significant, as all parameters are influential
in increasing the number of children with congenital disabilities but
in varying proportions.
Thank you.