Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
Estimating the Parameters of the Poisson Regression Model Under the Multicollinearity Problem (2022-US-30MP-1148)

Raaed Fadhil, statistician, University of Mustansiriyah - Ministry of Higher Education & Scientific Research - Iraq


In this paper, we will review the most prominent methods for estimating parameters of the Poisson regression model when data suffers from a semi-multicollinearity problem, such as Ridge regression and Liu estimator's method. Estimation methods were applied to real data obtained from Central Child Hospital in Baghdad, representing the number of cases of congenital defects of children in the heart and circulatory system for the period from 2012-2019; The results showed the superiority of the Liu estimators' method over the ridge regression method based on (AIC) as a criterion for comparison.


Keywords: Poisson regression, Liu estimators, Multicollinearity problem.




Hello  everyone,

my  name  is  Raaed Fadhil  Mohammed.

I  am  a  statistician. I  lecturer  in  University  of  Mustansiriyah.

My  paper title is  Estimating  the  Parameter of  Poisson  Regression  Model  Under

the   Multicollinearity  Problem .

Outline:  Poisson  Regression  Model, Multicollinearity  problem,

Ridge Regression Estimator  Method,

Liu Estimators Method, and  Real  Data  Example.

Conclusion  and  References.

Poisson Regression  Model.

One  of  the  types  of  regression  model that  fall  under  linear-logarithmic

regression  model  as  by  taking the  natural  logarithm

of  the  distribution  formula, it turns  into  a  linear  procedure.

Random  errors  in the  model  follow a  Poisson  distribution  with  a  parameter


The  model  is  based  on  two  essential assumptions  about  the  distribution

as  it  differs  from  the  distribution of  random  errors in the linear regression

model and  the  properties  of  the  Poisson distribution  parameter  mu  as  a  function

of  predictor  variables.

M ulticollinearity  Problem.

Multicollinearity Problems occur  when  two or  more  predictor  variables  are  correlated

to  a solid  linear  relation, so  it's  difficult  to  separate  the  effect

of  each  predictor  variable from  the  dependent  variable

in  practice.

Or  when  the  value  of  one  of  the  predictor variables  depends  on  one  or  more

of  the  other  predictor  variables in  the  model  under  study,

as  well  as  if the  data  takes  the  form of  a  time  series  or  across- section  data.

The  multicollinearity   problem  can  be classified  into  two  types:

Number  1,  Perfect multicollinearity.

The determinant of the  information  matrix is zero,  x  transpose  x  determined  equal


It follows from  this  is  impossible to  estimate,

the  parameters  of  the  regression  model due  to  the  inability  to  calculate

the  inverse  of  matrix  x  transpose  x.

The  best  method  in  this  case  to calculate  x  transpose x.  We  can  make  use

principal  component  analysis.

Number  2,  Semi-perfect  multicollinearity.

In  this  case,  if  the  value of  the  determinant  information  matrix

is  minimal,  close  to  zero,

then  the  parameters estimated  considerable  variance.

The  best  method  in  this  case  we can  use  regression  method  or  Leo

estimator  method.

The  following  formula  here  can  express the  variance- covariance  matrix

of the  parameters  estimated.

Perhaps  the  best  statistical  method for  measuring  the  multicollinearity

intensity  is  the  variance  in flation factor  VIF,  whose  formula is  as  follows.

VIF  equal  one  over  one  minus  R  square.

R  square  here  determined  coefficient.

Ridge Regression  Estimator  Method.

One  of  the  important  alternatives for  estimating  the  parameters

of  regression  module  when  there is  m ulticollinearity  between  predictor


This  method  established  according to  the  principle  of  the  researchers  Hoerl

and  Kennard,  which  is  by  adding  a  small positive  quantity  to  the  mine  diameter

elements  of  the  information  matrix.

The  regression  estimators  are  based  when k  greater  than  zero  so  that  the  base

amount  can  be  expressed  by  the  formula, Z minus  identity  by  beta.

Liu  Estimators  method.

The researcher  Liu 1993  laid the  foundations  of  this  method  to  address

the  issue  of  the  variance  inflation of  the  estimated  parameters

in  the  presence of  multicollinearity  a problem.

The  Liu  estimator  for  the  parameter Poisson  regression  can  be  expressed

in  the  following  formula.

Also  Liu  estimators  are  biased  when d  greater  than  zero  and  the  magnitude

of  the  bias  is  z  minus  identity  by  beta.

Liu  estimators  are  biased, the  reason  of  the  bias  is  the  added  value

d,  which  ranges  between  zero  and  one.

Also,  the  calculated  mean  squared  error according  to  Liu  estimators'  method

is  less  than  the  mean  squared  error for  the  same  parameters  if  estimated

according  to  the  maximum  likelihood method.

Real  Data  example.

We  will  obtain  real  data  concerning congenital  defects  of  the  heart

and  circulatory  system  in  a  new borns  from the  Central  Child T eaching  Hospital

in  Baghdad,  Iraq, where  the  distribution  of  a  dependent

variable  y represents  abnormalities of  the  heart  and  circulatory  system

in  children  was  studied.

Also the  revealing  existence of  a  multicollinearity  problem  among

the  predictor  variables  under  study.

The  case  of  congenital  disabilities arriving  at  the  Central  Child  Teaching

Hospital  are  recorded  in  a  form  prepared by  the  Statistics  Division  in  the  hospital

in  the  form  of  count  data  and  totals within  semi  monthly  periods,

the  sample  was  taken  for  the  period  from 2012  to  end  2019,

and a Poisson  regression  model  was  built as  one  of  the  appropriate  models

to  describe  this phenomenon as  the  following  formula:

yi equ al  exponential  beta  one  xi 1  plus beta  2  xi  2  plus  beta  3  xi  3 plus beta

4 xi 4 plus beta 5 xi 5 plus beta 6 xi 6 plus beta 7 xi plus ui.

That y represent  the  total  number of  children  with  congenital  heart

and  circulatory  defects  in  each  period.

Xi1, the total weighted  of  infected children  within  each  period.

Xi2, the  total  ages  fathers  of  inflected children  within  each  period.

Xi3, the total ages mothers of inflected children within each period.

Xi 4,  represents  the  number  of  infected male  children  within  each  period.

Xi5   represents  the  number  of  inected female  children  within  each.

Xi 6,  the  number  of  infected children  born from consanguineous  marriages

within  each  period.

Xi7,  the  number  of  infected  children whose  mothers  were  exposed  to  radiation

or  life  influence  such  as  taking  certain medications  and  drugs  during pregnancy.

Beta  one,  beta  two,  beta  three,  beta  four, beta  five,  beta  six  and  beta  seven  beta.

The  slope  parameters  in  the  model and  beta  note  represents

the  constant term.

ui  represent the  random  error  in  the  model.

This  table,  Testing  Data  and  Diagnoising Multicollinearity  to  find  out   probability

distribution  according  to  which  response variable  can  be  distributed.

We  use  jump  pro  16.2  and  it  was  found  y, dependent  variable  follows  the  Poisson

distribution  with  distributed  parameter, mu  equals  6.5.

To  verify, the  suitability  of  the   Poisson

distribution  to  the  response  variable y.

The  testing  goodness  of  it  was conducted  for  the  variable  of  the  total

number  of  children  with  congenital  heart and  circulatory  defense  in  each  period.

The  throw  which  we  make  show that  the  poison  distribution  is  the  most

promoted  distribution  that  dependant variable  can follow.

Where  we   not have  the  goodness  of  test value  is   4958.4579  with  significance

level  close  to  zero.

As  on  the  table  in  the  front  of  you.

To  detect  whether  there  is multicolliniarity  among  the  product

variable  under  study,

we  can  calculate  the  correlation  matrix between  the  predictor  variables.

From  the  figure  we  observe  that  value of  correlation  coefficient  are  significant

and  large  for  all  predictor  variables, as  each  variable  is  associated  with  all

predictor  variables, with  the  strong  direct  linear  correlation.

This  table  shown  the  value  of  various function  factors.

As  the  largest  of  them,  well  those of  the  projector variables, X2, X3 and X4

The variance inflation factor for undermining projector variables

exists the number 18.

From  this  we  conclude  that there  is  a  linear  multiplicate  between

the  predictor  variables  and  the [inaudible 00:16:24].

Application  of  Poisson  regression  method.

Parameter  estimator  of Poisson regression model using  method  regression,

we  observe  that  the  total  number of  children  with  congenital  health,

and  circulatory  defects  in  each  period depends  on  the  increase  and  all

parameters  of   [inaudible 00:17:11].

However, most  variables  are  insignificant.  X1,  X4,

X5 and X6

because  of  the  effects  of  semi- perfect multi collinearity.

Also,  the  result  indicates  that the  base  parameter  is  k  equal  zero

point one, two.

[foreign language 00:17:53]


This  table  we  can  obtain  by  using  JMP.

When  we  are  applying  the  Liu  estimator method  to  estimate  the  coefficient

of  the  Poisson  regression  module in  the  presence  of  the   multicollinearity


We  use  the  JMP  secret  to  connect between  our  language  and  JMP.

This  secret  to connect and run from  JMP,

the  package of  Liu  regression  in  our  language.

While applying to  [inaudible 00:19:12], the coefficient, we obtained this result.

We  observed  that  the  total  number  of children  with  the  congenital  hearts

and  circulatory   in  each  period depends  on  the  extent

of  increase  in  all  parameters of  the  model,

despite  the  insignificance of  the  variable  xi  1.

Because  all  variables  under  study  are increasing  the  number  of  children

with  congenital  disabilities, but  it  is  very  good  properties.

Also  the  result  indicates

that  Hiaki  formation  criteria  35. 29

add  the  base  parameter  d  equal 4.1.

When  comparing  the  two  methods  regression and  Liu estimator,

we  know  that  the  estimator  approach given  a  low  value  of  information.

Has  more  significant  proficient  when they  compare  to  the  regrasion  method.


In  this  paper,  we  review  the  most prominent  method  of  parameter  estimating

of  the  Poisson  regression  model  when the  data  suffer  from  the  problem

of  semi- perfect multicollinearity, where  took  the ridge  regression  method

and  Liu  estimators'  method and  compared  the  two  methods  based

on  Account  Information  Criteria  as a  criterion  for  comparison.

By  applying  regression  analysis  method in  the  presence  of  a  semi-perfect

multicollnearity  problem to real data  regarding  congenital  heart

and  circulatory  defects  in  newborns, obtained  from the  Central  Child  Teaching

Hospital  for  the  period  from  2012  to  2019,

we  find  the Liu  estimator's  method is  more  efficient  than  the  regression

method  because  it  has  a  lower Akaike's information  criterion.

It  also  gives  more  reliable  results and  more  accurate  p-values.

Number  3:  Through Liu estimators'  method, it  is  clear  that  all  predictor  variables

under  study are  influential in  the  regression  model,

even  if  they  are  not  significant, as  all  parameters  are  influential

in  increasing  the  number  of  children with  congenital  disabilities  but

in varying  proportions.

Thank  you.