Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Logistic Regression Modeling

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Logistic Regression Modeling

May 9, 2011 9:04 AM
(3493 views)

I'm attempting to do some logistic regression modeling on a huge dataset consisting of customer behavior (Good/Bad) and demographic predictors (income, gender, etc.). I have almost 45K records but there is a lot of missing data (only about 23K complete records).

I would like to use the Validate feature in Fit Model but can't seem to get it to work. I have a column that contains only 3 unique values ("Training", "Validation", "Test") which I enter as the Validation column but it seems that it just continues to fit all the data.

I was also wondering why I always get significant lack of fit in my models. Is this simply a matter of the huge sample size?

Thanks for any insight!

I would like to use the Validate feature in Fit Model but can't seem to get it to work. I have a column that contains only 3 unique values ("Training", "Validation", "Test") which I enter as the Validation column but it seems that it just continues to fit all the data.

I was also wondering why I always get significant lack of fit in my models. Is this simply a matter of the huge sample size?

Thanks for any insight!

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Logistic Regression Modeling

What platform are you using to do Logistic regression and do you have JMP Pro, JMP 9 or JMP 8?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Logistic Regression Modeling

mewing,

Thanks for the response! I'm using JMP 9 and am using the Fit Model platform. I've tried several different models (main effects only and main effects plus two-factor interactions for different subsets of the predictors) and the lack of fit is always significant.

Thanks for the response! I'm using JMP 9 and am using the Fit Model platform. I've tried several different models (main effects only and main effects plus two-factor interactions for different subsets of the predictors) and the lack of fit is always significant.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Logistic Regression Modeling

By the sound of it, I suspect that the significance of the lack of fit probably *is* due to the sample size. If any particular combination of demographic factors within the model as a whole is consistently over- or underestimating the observed response rate for that specific combination, that would contribute substantially to the lack of fit statistic, and an exceptionally large number of data points would then push the significance of it sky-high. All that's really saying is that you've got an awful lot of evidence that the model isn't exactly right (which you probably already knew anyway).

Probably of greater interest than the lack-of-fit would be the percentage difference between the observed and expected response: if it's really very small, the significance of the lack of fit probably wouldn't matter too much. I'm afraid I'm not familiar with the "Validate" feature yet, but I'll try it out myself with some data of my own and see what happens.

Probably of greater interest than the lack-of-fit would be the percentage difference between the observed and expected response: if it's really very small, the significance of the lack of fit probably wouldn't matter too much. I'm afraid I'm not familiar with the "Validate" feature yet, but I'll try it out myself with some data of my own and see what happens.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Logistic Regression Modeling

David,

Thanks for the reply!

Thanks for the reply!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Logistic Regression Modeling

David is correct - huge sample sizes usually render most statistical tests of assumptions useless as they have sufficient power to detect any amount of change. In grad school I ran an experiment on processing efficiency in the R language and gathered so much data that I couldn't use any tests for normality, homoskedasticity, etc... had to use visual assessments.

As for the Fit Model platform - I do not see any place to specify a training/validation identifier column. Searching through the JMP help files, it appears that there should be a validation area for Logistic Regression in the Fit Model platform, but I cannot identify it. The "JMP Pro Features" site says that the validation column role in many modeling platforms is exclusive to JMP Pro, Version 9.

If you can tell me how you're specifying the validation column, I may be able to help further.

As for the Fit Model platform - I do not see any place to specify a training/validation identifier column. Searching through the JMP help files, it appears that there should be a validation area for Logistic Regression in the Fit Model platform, but I cannot identify it. The "JMP Pro Features" site says that the validation column role in many modeling platforms is exclusive to JMP Pro, Version 9.

If you can tell me how you're specifying the validation column, I may be able to help further.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Logistic Regression Modeling

The Validation column in Fit Model Logistic is only available in JMP 9 Pro. But you can still exclude some rows from the fitting process, and use those rows later to assess model fit.

If you only have access to Standard JMP and want some sort of*automated* validation, the Partition platform will work. You can specify a Validation Portion, or use K-Fold CrossValidation. The Partition platform doesn't result in a nice prediction equation (rather, it gives a set of rules), but it can result in good predictions, and it will allow you to use Validation. It's also easy to assess the importance of the X variables by using the Column Contributions option.

The Neural platform in Standard JMP also provides validation. If being able to easily interpret the model coefficientrs is important, then Neural may not be what you want. But, like Partition, it can result in good predictions as long as there is structure in the data to be modeled.

The Neural and Partition platforms are intended for large amounts of data, which you have. If you use those two platforms, I strongly recommend using the Validation features, since those two platforms can easily overfit without validation. In fact, with Neural, you have to use validation.

If you only have access to Standard JMP and want some sort of

The Neural platform in Standard JMP also provides validation. If being able to easily interpret the model coefficientrs is important, then Neural may not be what you want. But, like Partition, it can result in good predictions as long as there is structure in the data to be modeled.

The Neural and Partition platforms are intended for large amounts of data, which you have. If you use those two platforms, I strongly recommend using the Validation features, since those two platforms can easily overfit without validation. In fact, with Neural, you have to use validation.