Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Is there any sample data set for testing Logistic Regression with a large number...

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted
##
Is there any sample data set for testing Logistic Regression with a large number (10-20) of independent variables?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Mar 24, 2014 1:28 PM
(2606 views)

I have a problem with 10s of effects (independent variables) and one categorical dependent variable. The probability of the categorical variable taking on a value of "1" is very small- about 10 in a million. Consequently, I have 10s of millions of data points from which I would like to estimate a model using Logistic Regression, that would give me the coefficients for the independent variables to compute the probability of the dependent variable.

Before I run Logistic Regression and just take whatever coefficients it spits out, I would like to see how well JMP handles a problem of such dimensionality. Is anyone aware of a test data set that I could use to test JMP on a problem of similar size?

Thank you for the help.

3 REPLIES 3

Highlighted
##
Re: Is there any sample data set for testing Logistic Regression with a large number (10-20) of independent variables?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

try Kaggle.com - probably violates their TOS though

Highlighted
##
Re: Is there any sample data set for testing Logistic Regression with a large number (10-20) of independent variables?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

You might want to download the data in the link mentioned in the paper at: http://www.sascommunity.org/wiki/Expert_Panel_Solution_MWSUG_2013-Tabachneck

The link is for a dataset provided by MWSUG, last year, as part of an expert panel I was on. The dataset has around 1 million records with about 20 or so independent variables.

Highlighted
##
Re: Is there any sample data set for testing Logistic Regression with a large number (10-20) of independent variables?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Alternatively, use an exisiting JMP sample data set and expand it to desired size.

For the table below JMP 11 returns the parameter estimates in less than 30 s.

(>16 million rows, 13 independent variables)

dt = Open**(** "$SAMPLE_DATA/Body Fat.jmp" **)**;

// Add nominal binomial column with rare 1's

dt << **New Column****(** "Over 70", numeric, nominal, values**(** **(**Column**(** "Age(years)" **)** << **get values****)** > **70** **)** **)**;

// Delete redundant columns

dt << **delete columns****(**

Eval**(** dt << **get column group****(** "Prediction Formulas" **)** ||

**{**Column**(** "Validation" **)**,Column**(** "Age(years)" **)})**

**)**;

cols = dt << **get column names****()**;

// Make dataset bigger!

While**(** N Row**(** dt **)** < **1e7**, dt << **concatenate****(** dt, append to first table **)** **)**; Wait**(****0****)**;

// Fit nominal logistic

Fit Model**(**

Y**(** :Over 70 **)**,

Effects**(** Eval**(** cols**[****1** :: N Items**(** cols **)** - **1****]** **)** **)**,

Personality**(** Nominal Logistic **)**,

Run**(** Likelihood Ratio Tests**(** **1** **)**, Wald Tests**(** **0** **)** **)**

**)**;

Article Labels

There are no labels assigned to this post.