In this paper, we will review the most prominent methods for estimating parameters of the Poisson regression model when data suffers from a semi-multicollinearity problem, such as Ridge regression and Liu estimator's method. Estimation methods were applied to real data obtained from Central Child Hospital in Baghdad, representing the number of cases of congenital defects of children in the heart and circulatory system for the period from 2012-2019; The results showed the superiority of the Liu estimators' method over the ridge regression method based on (AIC) as a criterion for comparison.

 

Keywords: Poisson regression, Liu estimators, Multicollinearity problem.

 
 
 
 

 

Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
  • Chapters
  • descriptions off, selected
  • captions off, selected

     

    Hello  everyone,

    my  name  is  Raaed Fadhil  Mohammed.

    I  am  a  statistician. I  lecturer  in  University  of  Mustansiriyah.

    My  paper title is  Estimating  the  Parameter of  Poisson  Regression  Model  Under

    the   Multicollinearity  Problem .

    Outline:  Poisson  Regression  Model, Multicollinearity  problem,

    Ridge Regression Estimator  Method,

    Liu Estimators Method, and  Real  Data  Example.

    Conclusion  and  References.

    Poisson Regression  Model.

    One  of  the  types  of  regression  model that  fall  under  linear-logarithmic

    regression  model  as  by  taking the  natural  logarithm

    of  the  distribution  formula, it turns  into  a  linear  procedure.

    Random  errors  in the  model  follow a  Poisson  distribution  with  a  parameter

    mu.

    The  model  is  based  on  two  essential assumptions  about  the  distribution

    as  it  differs  from  the  distribution of  random  errors in the linear regression

    model and  the  properties  of  the  Poisson distribution  parameter  mu  as  a  function

    of  predictor  variables.

    M ulticollinearity  Problem.

    Multicollinearity Problems occur  when  two or  more  predictor  variables  are  correlated

    to  a solid  linear  relation, so  it's  difficult  to  separate  the  effect

    of  each  predictor  variable from  the  dependent  variable

    in  practice.

    Or  when  the  value  of  one  of  the  predictor variables  depends  on  one  or  more

    of  the  other  predictor  variables in  the  model  under  study,

    as  well  as  if the  data  takes  the  form of  a  time  series  or  across- section  data.

    The  multicollinearity   problem  can  be classified  into  two  types:

    Number  1,  Perfect multicollinearity.

    The determinant of the  information  matrix is zero,  x  transpose  x  determined  equal

    zero.

    It follows from  this  is  impossible to  estimate,

    the  parameters  of  the  regression  model due  to  the  inability  to  calculate

    the  inverse  of  matrix  x  transpose  x.

    The  best  method  in  this  case  to calculate  x  transpose x.  We  can  make  use

    principal  component  analysis.

    Number  2,  Semi-perfect  multicollinearity.

    In  this  case,  if  the  value of  the  determinant  information  matrix

    is  minimal,  close  to  zero,

    then  the  parameters estimated  considerable  variance.

    The  best  method  in  this  case  we can  use  regression  method  or  Leo

    estimator  method.

    The  following  formula  here  can  express the  variance- covariance  matrix

    of the  parameters  estimated.

    Perhaps  the  best  statistical  method for  measuring  the  multicollinearity

    intensity  is  the  variance  in flation factor  VIF,  whose  formula is  as  follows.

    VIF  equal  one  over  one  minus  R  square.

    R  square  here  determined  coefficient.

    Ridge Regression  Estimator  Method.

    One  of  the  important  alternatives for  estimating  the  parameters

    of  regression  module  when  there is  m ulticollinearity  between  predictor

    variables.

    This  method  established  according to  the  principle  of  the  researchers  Hoerl

    and  Kennard,  which  is  by  adding  a  small positive  quantity  to  the  mine  diameter

    elements  of  the  information  matrix.

    The  regression  estimators  are  based  when k  greater  than  zero  so  that  the  base

    amount  can  be  expressed  by  the  formula, Z minus  identity  by  beta.

    Liu  Estimators  method.

    The researcher  Liu 1993  laid the  foundations  of  this  method  to  address

    the  issue  of  the  variance  inflation of  the  estimated  parameters

    in  the  presence of  multicollinearity  a problem.

    The  Liu  estimator  for  the  parameter Poisson  regression  can  be  expressed

    in  the  following  formula.

    Also  Liu  estimators  are  biased  when d  greater  than  zero  and  the  magnitude

    of  the  bias  is  z  minus  identity  by  beta.

    Liu  estimators  are  biased, the  reason  of  the  bias  is  the  added  value

    d,  which  ranges  between  zero  and  one.

    Also,  the  calculated  mean  squared  error according  to  Liu  estimators'  method

    is  less  than  the  mean  squared  error for  the  same  parameters  if  estimated

    according  to  the  maximum  likelihood method.

    Real  Data  example.

    We  will  obtain  real  data  concerning congenital  defects  of  the  heart

    and  circulatory  system  in  a  new borns  from the  Central  Child T eaching  Hospital

    in  Baghdad,  Iraq, where  the  distribution  of  a  dependent

    variable  y represents  abnormalities of  the  heart  and  circulatory  system

    in  children  was  studied.

    Also the  revealing  existence of  a  multicollinearity  problem  among

    the  predictor  variables  under  study.

    The  case  of  congenital  disabilities arriving  at  the  Central  Child  Teaching

    Hospital  are  recorded  in  a  form  prepared by  the  Statistics  Division  in  the  hospital

    in  the  form  of  count  data  and  totals within  semi  monthly  periods,

    the  sample  was  taken  for  the  period  from 2012  to  end  2019,

    and a Poisson  regression  model  was  built as  one  of  the  appropriate  models

    to  describe  this phenomenon as  the  following  formula:

    yi equ al  exponential  beta  one  xi 1  plus beta  2  xi  2  plus  beta  3  xi  3 plus beta

    4 xi 4 plus beta 5 xi 5 plus beta 6 xi 6 plus beta 7 xi plus ui.

    That y represent  the  total  number of  children  with  congenital  heart

    and  circulatory  defects  in  each  period.

    Xi1, the total weighted  of  infected children  within  each  period.

    Xi2, the  total  ages  fathers  of  inflected children  within  each  period.

    Xi3, the total ages mothers of inflected children within each period.

    Xi 4,  represents  the  number  of  infected male  children  within  each  period.

    Xi5   represents  the  number  of  inected female  children  within  each.

    Xi 6,  the  number  of  infected children  born from consanguineous  marriages

    within  each  period.

    Xi7,  the  number  of  infected  children whose  mothers  were  exposed  to  radiation

    or  life  influence  such  as  taking  certain medications  and  drugs  during pregnancy.

    Beta  one,  beta  two,  beta  three,  beta  four, beta  five,  beta  six  and  beta  seven  beta.

    The  slope  parameters  in  the  model and  beta  note  represents

    the  constant term.

    ui  represent the  random  error  in  the  model.

    This  table,  Testing  Data  and  Diagnoising Multicollinearity  to  find  out   probability

    distribution  according  to  which  response variable  can  be  distributed.

    We  use  jump  pro  16.2  and  it  was  found  y, dependent  variable  follows  the  Poisson

    distribution  with  distributed  parameter, mu  equals  6.5.

    To  verify, the  suitability  of  the   Poisson

    distribution  to  the  response  variable y.

    The  testing  goodness  of  it  was conducted  for  the  variable  of  the  total

    number  of  children  with  congenital  heart and  circulatory  defense  in  each  period.

    The  throw  which  we  make  show that  the  poison  distribution  is  the  most

    promoted  distribution  that  dependant variable  can follow.

    Where  we   not have  the  goodness  of  test value  is   4958.4579  with  significance

    level  close  to  zero.

    As  on  the  table  in  the  front  of  you.

    To  detect  whether  there  is multicolliniarity  among  the  product

    variable  under  study,

    we  can  calculate  the  correlation  matrix between  the  predictor  variables.

    From  the  figure  we  observe  that  value of  correlation  coefficient  are  significant

    and  large  for  all  predictor  variables, as  each  variable  is  associated  with  all

    predictor  variables, with  the  strong  direct  linear  correlation.

    This  table  shown  the  value  of  various function  factors.

    As  the  largest  of  them,  well  those of  the  projector variables, X2, X3 and X4

    The variance inflation factor for undermining projector variables

    exists the number 18.

    From  this  we  conclude  that there  is  a  linear  multiplicate  between

    the  predictor  variables  and  the [inaudible 00:16:24].

    Application  of  Poisson  regression  method.

    Parameter  estimator  of Poisson regression model using  method  regression,

    we  observe  that  the  total  number of  children  with  congenital  health,

    and  circulatory  defects  in  each  period depends  on  the  increase  and  all

    parameters  of   [inaudible 00:17:11].

    However, most  variables  are  insignificant.  X1,  X4,

    X5 and X6

    because  of  the  effects  of  semi- perfect multi collinearity.

    Also,  the  result  indicates  that the  base  parameter  is  k  equal  zero

    point one, two.

    [foreign language 00:17:53]

    86.4959.

    This  table  we  can  obtain  by  using  JMP.

    When  we  are  applying  the  Liu  estimator method  to  estimate  the  coefficient

    of  the  Poisson  regression  module in  the  presence  of  the   multicollinearity

    problem.

    We  use  the  JMP  secret  to  connect between  our  language  and  JMP.

    This  secret  to connect and run from  JMP,

    the  package of  Liu  regression  in  our  language.

    While applying to  [inaudible 00:19:12], the coefficient, we obtained this result.

    We  observed  that  the  total  number  of children  with  the  congenital  hearts

    and  circulatory   in  each  period depends  on  the  extent

    of  increase  in  all  parameters of  the  model,

    despite  the  insignificance of  the  variable  xi  1.

    Because  all  variables  under  study  are increasing  the  number  of  children

    with  congenital  disabilities, but  it  is  very  good  properties.

    Also  the  result  indicates

    that  Hiaki  formation  criteria  35. 29

    add  the  base  parameter  d  equal 4.1.

    When  comparing  the  two  methods  regression and  Liu estimator,

    we  know  that  the  estimator  approach given  a  low  value  of  information.

    Has  more  significant  proficient  when they  compare  to  the  regrasion  method.

    Conclusions.

    In  this  paper,  we  review  the  most prominent  method  of  parameter  estimating

    of  the  Poisson  regression  model  when the  data  suffer  from  the  problem

    of  semi- perfect multicollinearity, where  took  the ridge  regression  method

    and  Liu  estimators'  method and  compared  the  two  methods  based

    on  Account  Information  Criteria  as a  criterion  for  comparison.

    By  applying  regression  analysis  method in  the  presence  of  a  semi-perfect

    multicollnearity  problem to real data  regarding  congenital  heart

    and  circulatory  defects  in  newborns, obtained  from the  Central  Child  Teaching

    Hospital  for  the  period  from  2012  to  2019,

    we  find  the Liu  estimator's  method is  more  efficient  than  the  regression

    method  because  it  has  a  lower Akaike's information  criterion.

    It  also  gives  more  reliable  results and  more  accurate  p-values.

    Number  3:  Through Liu estimators'  method, it  is  clear  that  all  predictor  variables

    under  study are  influential in  the  regression  model,

    even  if  they  are  not  significant, as  all  parameters  are  influential

    in  increasing  the  number  of  children with  congenital  disabilities  but

    in varying  proportions.

    Thank  you.

    Published on ‎05-20-2024 07:53 AM by | Updated on ‎05-20-2024 08:21 AM

    In this paper, we will review the most prominent methods for estimating parameters of the Poisson regression model when data suffers from a semi-multicollinearity problem, such as Ridge regression and Liu estimator's method. Estimation methods were applied to real data obtained from Central Child Hospital in Baghdad, representing the number of cases of congenital defects of children in the heart and circulatory system for the period from 2012-2019; The results showed the superiority of the Liu estimators' method over the ridge regression method based on (AIC) as a criterion for comparison.

     

    Keywords: Poisson regression, Liu estimators, Multicollinearity problem.

     
     
     
     

     

    Video Player is loading.
    Current Time 0:00
    Duration 0:00
    Loaded: 0%
    Stream Type LIVE
    Remaining Time 0:00
     
    1x
    • Chapters
    • descriptions off, selected
    • captions off, selected

       

      Hello  everyone,

      my  name  is  Raaed Fadhil  Mohammed.

      I  am  a  statistician. I  lecturer  in  University  of  Mustansiriyah.

      My  paper title is  Estimating  the  Parameter of  Poisson  Regression  Model  Under

      the   Multicollinearity  Problem .

      Outline:  Poisson  Regression  Model, Multicollinearity  problem,

      Ridge Regression Estimator  Method,

      Liu Estimators Method, and  Real  Data  Example.

      Conclusion  and  References.

      Poisson Regression  Model.

      One  of  the  types  of  regression  model that  fall  under  linear-logarithmic

      regression  model  as  by  taking the  natural  logarithm

      of  the  distribution  formula, it turns  into  a  linear  procedure.

      Random  errors  in the  model  follow a  Poisson  distribution  with  a  parameter

      mu.

      The  model  is  based  on  two  essential assumptions  about  the  distribution

      as  it  differs  from  the  distribution of  random  errors in the linear regression

      model and  the  properties  of  the  Poisson distribution  parameter  mu  as  a  function

      of  predictor  variables.

      M ulticollinearity  Problem.

      Multicollinearity Problems occur  when  two or  more  predictor  variables  are  correlated

      to  a solid  linear  relation, so  it's  difficult  to  separate  the  effect

      of  each  predictor  variable from  the  dependent  variable

      in  practice.

      Or  when  the  value  of  one  of  the  predictor variables  depends  on  one  or  more

      of  the  other  predictor  variables in  the  model  under  study,

      as  well  as  if the  data  takes  the  form of  a  time  series  or  across- section  data.

      The  multicollinearity   problem  can  be classified  into  two  types:

      Number  1,  Perfect multicollinearity.

      The determinant of the  information  matrix is zero,  x  transpose  x  determined  equal

      zero.

      It follows from  this  is  impossible to  estimate,

      the  parameters  of  the  regression  model due  to  the  inability  to  calculate

      the  inverse  of  matrix  x  transpose  x.

      The  best  method  in  this  case  to calculate  x  transpose x.  We  can  make  use

      principal  component  analysis.

      Number  2,  Semi-perfect  multicollinearity.

      In  this  case,  if  the  value of  the  determinant  information  matrix

      is  minimal,  close  to  zero,

      then  the  parameters estimated  considerable  variance.

      The  best  method  in  this  case  we can  use  regression  method  or  Leo

      estimator  method.

      The  following  formula  here  can  express the  variance- covariance  matrix

      of the  parameters  estimated.

      Perhaps  the  best  statistical  method for  measuring  the  multicollinearity

      intensity  is  the  variance  in flation factor  VIF,  whose  formula is  as  follows.

      VIF  equal  one  over  one  minus  R  square.

      R  square  here  determined  coefficient.

      Ridge Regression  Estimator  Method.

      One  of  the  important  alternatives for  estimating  the  parameters

      of  regression  module  when  there is  m ulticollinearity  between  predictor

      variables.

      This  method  established  according to  the  principle  of  the  researchers  Hoerl

      and  Kennard,  which  is  by  adding  a  small positive  quantity  to  the  mine  diameter

      elements  of  the  information  matrix.

      The  regression  estimators  are  based  when k  greater  than  zero  so  that  the  base

      amount  can  be  expressed  by  the  formula, Z minus  identity  by  beta.

      Liu  Estimators  method.

      The researcher  Liu 1993  laid the  foundations  of  this  method  to  address

      the  issue  of  the  variance  inflation of  the  estimated  parameters

      in  the  presence of  multicollinearity  a problem.

      The  Liu  estimator  for  the  parameter Poisson  regression  can  be  expressed

      in  the  following  formula.

      Also  Liu  estimators  are  biased  when d  greater  than  zero  and  the  magnitude

      of  the  bias  is  z  minus  identity  by  beta.

      Liu  estimators  are  biased, the  reason  of  the  bias  is  the  added  value

      d,  which  ranges  between  zero  and  one.

      Also,  the  calculated  mean  squared  error according  to  Liu  estimators'  method

      is  less  than  the  mean  squared  error for  the  same  parameters  if  estimated

      according  to  the  maximum  likelihood method.

      Real  Data  example.

      We  will  obtain  real  data  concerning congenital  defects  of  the  heart

      and  circulatory  system  in  a  new borns  from the  Central  Child T eaching  Hospital

      in  Baghdad,  Iraq, where  the  distribution  of  a  dependent

      variable  y represents  abnormalities of  the  heart  and  circulatory  system

      in  children  was  studied.

      Also the  revealing  existence of  a  multicollinearity  problem  among

      the  predictor  variables  under  study.

      The  case  of  congenital  disabilities arriving  at  the  Central  Child  Teaching

      Hospital  are  recorded  in  a  form  prepared by  the  Statistics  Division  in  the  hospital

      in  the  form  of  count  data  and  totals within  semi  monthly  periods,

      the  sample  was  taken  for  the  period  from 2012  to  end  2019,

      and a Poisson  regression  model  was  built as  one  of  the  appropriate  models

      to  describe  this phenomenon as  the  following  formula:

      yi equ al  exponential  beta  one  xi 1  plus beta  2  xi  2  plus  beta  3  xi  3 plus beta

      4 xi 4 plus beta 5 xi 5 plus beta 6 xi 6 plus beta 7 xi plus ui.

      That y represent  the  total  number of  children  with  congenital  heart

      and  circulatory  defects  in  each  period.

      Xi1, the total weighted  of  infected children  within  each  period.

      Xi2, the  total  ages  fathers  of  inflected children  within  each  period.

      Xi3, the total ages mothers of inflected children within each period.

      Xi 4,  represents  the  number  of  infected male  children  within  each  period.

      Xi5   represents  the  number  of  inected female  children  within  each.

      Xi 6,  the  number  of  infected children  born from consanguineous  marriages

      within  each  period.

      Xi7,  the  number  of  infected  children whose  mothers  were  exposed  to  radiation

      or  life  influence  such  as  taking  certain medications  and  drugs  during pregnancy.

      Beta  one,  beta  two,  beta  three,  beta  four, beta  five,  beta  six  and  beta  seven  beta.

      The  slope  parameters  in  the  model and  beta  note  represents

      the  constant term.

      ui  represent the  random  error  in  the  model.

      This  table,  Testing  Data  and  Diagnoising Multicollinearity  to  find  out   probability

      distribution  according  to  which  response variable  can  be  distributed.

      We  use  jump  pro  16.2  and  it  was  found  y, dependent  variable  follows  the  Poisson

      distribution  with  distributed  parameter, mu  equals  6.5.

      To  verify, the  suitability  of  the   Poisson

      distribution  to  the  response  variable y.

      The  testing  goodness  of  it  was conducted  for  the  variable  of  the  total

      number  of  children  with  congenital  heart and  circulatory  defense  in  each  period.

      The  throw  which  we  make  show that  the  poison  distribution  is  the  most

      promoted  distribution  that  dependant variable  can follow.

      Where  we   not have  the  goodness  of  test value  is   4958.4579  with  significance

      level  close  to  zero.

      As  on  the  table  in  the  front  of  you.

      To  detect  whether  there  is multicolliniarity  among  the  product

      variable  under  study,

      we  can  calculate  the  correlation  matrix between  the  predictor  variables.

      From  the  figure  we  observe  that  value of  correlation  coefficient  are  significant

      and  large  for  all  predictor  variables, as  each  variable  is  associated  with  all

      predictor  variables, with  the  strong  direct  linear  correlation.

      This  table  shown  the  value  of  various function  factors.

      As  the  largest  of  them,  well  those of  the  projector variables, X2, X3 and X4

      The variance inflation factor for undermining projector variables

      exists the number 18.

      From  this  we  conclude  that there  is  a  linear  multiplicate  between

      the  predictor  variables  and  the [inaudible 00:16:24].

      Application  of  Poisson  regression  method.

      Parameter  estimator  of Poisson regression model using  method  regression,

      we  observe  that  the  total  number of  children  with  congenital  health,

      and  circulatory  defects  in  each  period depends  on  the  increase  and  all

      parameters  of   [inaudible 00:17:11].

      However, most  variables  are  insignificant.  X1,  X4,

      X5 and X6

      because  of  the  effects  of  semi- perfect multi collinearity.

      Also,  the  result  indicates  that the  base  parameter  is  k  equal  zero

      point one, two.

      [foreign language 00:17:53]

      86.4959.

      This  table  we  can  obtain  by  using  JMP.

      When  we  are  applying  the  Liu  estimator method  to  estimate  the  coefficient

      of  the  Poisson  regression  module in  the  presence  of  the   multicollinearity

      problem.

      We  use  the  JMP  secret  to  connect between  our  language  and  JMP.

      This  secret  to connect and run from  JMP,

      the  package of  Liu  regression  in  our  language.

      While applying to  [inaudible 00:19:12], the coefficient, we obtained this result.

      We  observed  that  the  total  number  of children  with  the  congenital  hearts

      and  circulatory   in  each  period depends  on  the  extent

      of  increase  in  all  parameters of  the  model,

      despite  the  insignificance of  the  variable  xi  1.

      Because  all  variables  under  study  are increasing  the  number  of  children

      with  congenital  disabilities, but  it  is  very  good  properties.

      Also  the  result  indicates

      that  Hiaki  formation  criteria  35. 29

      add  the  base  parameter  d  equal 4.1.

      When  comparing  the  two  methods  regression and  Liu estimator,

      we  know  that  the  estimator  approach given  a  low  value  of  information.

      Has  more  significant  proficient  when they  compare  to  the  regrasion  method.

      Conclusions.

      In  this  paper,  we  review  the  most prominent  method  of  parameter  estimating

      of  the  Poisson  regression  model  when the  data  suffer  from  the  problem

      of  semi- perfect multicollinearity, where  took  the ridge  regression  method

      and  Liu  estimators'  method and  compared  the  two  methods  based

      on  Account  Information  Criteria  as a  criterion  for  comparison.

      By  applying  regression  analysis  method in  the  presence  of  a  semi-perfect

      multicollnearity  problem to real data  regarding  congenital  heart

      and  circulatory  defects  in  newborns, obtained  from the  Central  Child  Teaching

      Hospital  for  the  period  from  2012  to  2019,

      we  find  the Liu  estimator's  method is  more  efficient  than  the  regression

      method  because  it  has  a  lower Akaike's information  criterion.

      It  also  gives  more  reliable  results and  more  accurate  p-values.

      Number  3:  Through Liu estimators'  method, it  is  clear  that  all  predictor  variables

      under  study are  influential in  the  regression  model,

      even  if  they  are  not  significant, as  all  parameters  are  influential

      in  increasing  the  number  of  children with  congenital  disabilities  but

      in varying  proportions.

      Thank  you.



      0 Kudos