turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Best approach to fit large datasets

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Mar 11, 2015 11:58 PM
(544 views)

Hi,

In many cases, I'm trying to fit datasets where the data is segmented in different groups. In the most simple case, I just want a linear fit on a group of a few dozen datapoints, but where the dataset includes a large number of such groups.

I have used the bivariate platform in the past, doing Fit Y by X, and using the "Group by..." option to separate the fits for each group. However, for large datasets, the fitting is very slow, and seems like the wrong approach. Another option is to use the Fit Model platform, and using the "By" option to separate by groups, but the fitting seems also fairly slow.

What is the best strategy to fit large datasets? What are the best methods for fitting large datasets efficiently, especially when the raw data plots for each fit is not necessary?

1 REPLY

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Mar 12, 2015 6:33 AM
(385 views)

Hi danly,

Try the Fit Model approach again, but with your grouping variable as a factor in the model rather than as a 'By' variable. Also include a term in the model for an interaction between the grouping variable and your 'X' variable, i.e. something like 'X * Group'. So your model factors would be 'X', 'Group', and 'X * Group'.

When you use the 'By' option, JMP will effectively fit a separate model for each group, which will be slow. If you set the group as a factor in the model, JMP will fit one model for the whole dataset, and the output will include a table of the relevant linear fit coefficients by group. You'll also be able to easily compare whether the fit for one group is significantly different to the fit for another group, via the Estimates -> Multiple Comparisons platform.