turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Discussions
- :
- Parameter estimates in Logistic Regression

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Apr 14, 2014 8:27 AM
(2019 views)

I have simulated three data sets of 20M points each for testing LR with binary response variables with very rare events:

1) Two continuous effects: Beta0=12, beta1=1 and Beta2=1. X1 is Normal (-0.8, .25) and X2 is -ABS[Normal (0,5)]. (Each of the two variables increases the probability of the event, depending upon the value of the variable.) In this case there are about 711,000 positive events in 20M samples. Upon running LR, I get fairly good estimates of the three betas: *b _{0}* = 12.003,

2) The second data set has one continuous and one categorical effect. For this data, I replace the second continuous variable in the first data set with a binary variable occurring with probability 0.05 and a Beta2 = -2. This data contains just 394 events in the total of 20M samples. Upon running LR, the estimates are : *b _{0}* = 11.0988,

3) To generate the third data set I use both effects as binomially generated binary variables. I replace one continuous variable in the first data set with a binary variable occurring with probability 0.05 and a Beta2 = -2. I replace the second continuous variable with another binomially generated random variable with probability 0.1 and Beta = -1. This data contains just 165 events in the total of 20M samples. Upon running LR, the estimated parameter values are: *b _{0 }*= 10.582,

What am I missing here? Why should the estimates of coefficients of categorical variables be half their true values?

My next tests are going to be with under-sampling the non-events and then applying the under-sampling correction to the Beta0. But, first I would like to understand the estimates generated by LR in JMP. Any suggestions or clarification would be greatly appreciated.

6 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Apr 14, 2014 8:49 AM
(1812 views)
| Posted in reply to message from ranjan_mitre_or 04/14/2014 11:27 AM

Not as familiar with JMP, but here's some reasons why it happens in SAS

1. Check how you coded your binary variable (0/1) is different than 1/2

2. If you used some sort of automatic categories check how it is parametrized, ie effect coding or referential coding.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

I code the nominal variable as 0 and the Beta value and then use these values (what they happen to be at each entry) in the P(event) calculation by the logistic expression (1/(1+exp(beta0+Beta1+Beta2))). I assumed that the entry in the effects column will be used as just a 'label' when that column is designated as Nominal Effect. Are you suggesting that the value entered in the column is used somehow, if 0/1 and 1/2 makes a difference?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Apr 14, 2014 9:31 AM
(1812 views)
| Posted in reply to message from ranjan_mitre_or 04/14/2014 12:04 PM

http://www.jmp.com/support/help/The_Factor_Models.shtml#65535

Code it as 0/1 and don't include in nominal effect to see if you get the desired results.

See the link above to how JMP categorizes nominal variables.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

I coded the nominal variables as 0/1 as you recommend and then did not use them, and applied the appropriate betas in the P calculation. Now the estimates of the three betas are: 10.52, 0.9848 and 0.6534. So, the Beta1 and Beta2 that correspond to the nominal variables still appear to be about half the true value, but the sign is now opposite to what it should be. True Beta1 of -2 gets estimated as 0.9848 and the true Beta2 of -1 gets estimated as 0.6534. I am attaching a short version of the file if it helps. (if you do have time to run the LR in JMP, the file will have to be extended quite a bit to get enough events. I use a file of 20M rows.

Thanks

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Reeza

The link you sent applies to Linear Model. I am working with Logistic Regression.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Apr 16, 2014 12:18 PM
(1812 views)
| Posted in reply to message from ranjan_mitre_or 04/16/2014 02:54 PM

How it codes the categorical factor is the issue, not the type of model. Not that there are -1 in the list.

Try the 1/0 coding

ranjan@mitre.orgApr 14, 2014 10:44 AM

I coded the nominal variables as 0/1 as you recommend and then did not use them, and applied the appropriate betas in the P calculation.

I don't know what that means...

You can always try contacting tech support, if you're not getting an answer on here. They have people more familiar with JMP.